Re: [PLUG] [plug-announce] December 5th, 2007: "What's a file?" presente

Mark Dominus on 6 Dec 2007 16:16:05 -0000

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] [plug-announce] December 5th, 2007: "What's a file?" presented by Mark Jason Dominus

From: Mark Dominus <mjd-list-plug2@plover.com>

To: Philadelphia Linux User's Group Discussion List <plug@lists.phillylinux.org>

Subject: Re: [PLUG] [plug-announce] December 5th, 2007: "What's a file?" presented by Mark Jason Dominus

Date: Thu, 06 Dec 2007 11:15:53 -0500

Reply-to: Philadelphia Linux User's Group Discussion List <plug@lists.phillylinux.org>

Sender: plug-bounces@lists.phillylinux.org

Jonathan Bringhurst: > I just wanted to thank Mark for a great presentation. Thanks! I am glad you liked it. I hope I did not harrass any one person unduly. If so, I apologize. > As for the use of O_SYNC to prevent the kernel from buffering data > to reduce context switches, I just wanted to mention that it's only > for writing, I dunno why I was thinking it was for read()s > there. It's only used in obscure cases anyway. Yeah. We did discuss this a bit during the talk. The basic issue is that when your process asks the kernel to write data: int bytes_written = write(file_descriptor, buffer, n_bytes); the kernel normally copies the data from your buffer into a kernel buffer and then reports success back to the process immediately, even though the data is not on the disk yet. Normally, the kernel writes out the buffer in due time, and the data makes it to the disk, and you are happy because your process got to go ahead and do some more work without having to wait for the disk, which could take milliseconds. ("A long time", as I so quaintly called it yesterday.) If some other process reads the data before it is written, that is okay, because the kernel can give it the updated data out of the buffer. But if there is a catastrophe, say a power failure, then this asynchronous writing technique has a serious problem: you find out that the data, which your process thought had been written, has been lost. So there are a number of mechanisms in place to deal with this. The oldest is the "sync()" system call, which marks all the kernel buffers to be written out ASAP. All unix systems run a program called "init", and one of init's primary duties is to call sync() every thirty seconds or so, to make sure that the kernel buffers get flushed to disk at least every thirty secnods, and so no crash will lose more than about thirty seconds' worth of data. (There is also a command-line program "sync" which just does a sync() call and then exits, and old-time Unix sysadmins are in the habit of halting the system with # sync # sync # sync # halt because the second and third syncs give the kernel time to flush out buffers that were marked dirty by the first "sync". Although I suspect that few of them know why they do this. I swear I am not making this up.) But for really crucial data, sync() is not enough, because, although it marks the kernel buffers as dirty, it *still* does not actually write the data to the disk. So there is also an fsync() call. The process gives it a file descriptor, and it forces the process to wait until all buffers associated with that descriptor have been written to disk, and then return success only if they have: if (fsync(fd)) { /* uh-oh, couldn't write the data! */ } else { /* data is now on the disk */ } The mail delivery agent will use this when it is writing your email to your mailbox, to make sure that no mail is lost. Then there's an O_SYNC flag than the process can supply when it opens the file for writing: int fd = open("blookus", O_WRONLY | O_SYNC); This sets the O_SYNC flag in the file pointer object; whenever data is written to this file pointer, the kernel, contrary to its usual practice, will implicitly fsync() the descriptor after each write. There's an interesting question that arises with this: suppose you fsync() a file. That guarantees that the data will be written. But does it also guarantee that the mtime and the file extent of the file will be updated? On most systems, yes. But on recent versions of Linux's ext2 filesystem, no. Linus himself broke this as a sacrifice to the false god of efficiency, a very bad decision in my opinion. > Another random thing is the catching of interrupts when doing a system > call, most notably EINTR. Have you ever read Richard Gabriel's essay on "Worse is Better"? This is his big example of how Unix is Worse. > if the return from the syscall is < 0 you need to check errno and handle > it. Ah, but you don't need to check that if your program never catches signals. And that is why it sucks for programs to catch signals. ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug

Follow-Ups:

Re: [PLUG] [plug-announce] December 5th, 2007: "What's a file?" presented by Mark Jason Dominus
From: Toby DiPasquale <toby@cbcg.net>

Re: [PLUG] [plug-announce] December 5th, 2007: "What's a file?" presented by Mark Jason Dominus
From: Mark Dominus <mjd-list-plug2@plover.com>

Prev by Date: [PLUG] "What's a File?" talk slides now online

Next by Date: [PLUG] Eating Crow with OOO

Previous by thread: Re: [PLUG] [plug-announce] December 5th, 2007: "What's a file?" presented by Mark Jason Dominus

Next by thread: Re: [PLUG] [plug-announce] December 5th, 2007: "What's a file?" presented by Mark Jason Dominus

Index(es):

Date

Thread