Rich Freeman on 29 Sep 2013 05:18:07 -0700 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: [PLUG] Spanning volumes with LVM (Ubuntu) |
On Sat, Sep 28, 2013 at 6:26 PM, Matt Mossholder <matt@mossholder.com> wrote: > > There are several issues with utilizing RAID5 for MythTV storage. The > biggest is that when you write data that doesn't fill an entire stripe, you > end up having to read the stripe in first, so that you can then recalculate > the parity of the new stripe. That really is an issue with any striped RAID in general (anything but RAID1 basically). Nothing really MythTV-specific. I don't really buy into that MythTV wiki page. Unless you're putting a lot of demand on your array it shouldn't have trouble keeping up with multiple HD streams - mine certainly didn't as long as you have decent RAM for the cache. Even with multiple streams there is no reason it shouldn't be able to write data out in stripe-sized chunks as long as your buffers are right... > > I noticed when I was previously recording to RAID5 that I would get > occasional corruption in the data. This disappeared completely when I > rebuilt to use stand-alone recording drives. A few years ago I used to get that problem. I resolved it by disabling fsync in the MythTV ThreadedFileWriter routine and increasing its buffer sizes. That is just 3-4 lines of code to change and it is pretty trivial. In more recent versions it hasn't been a problem so I don't bother with it. The routine already slowly dumps data into the OS file buffer, so all you need to do is comment out the line that tries to fsync it so that the OS can just write it to disk in its own time. Not sure if they fixed it or if something else changed, but back in the ~0.24 days the buffer was pretty small (megabytes) and they'd try to fsync out really small chunks of it, so for a nice hour-long show there were probably tens of thousands of fsyncs. Ok, start of rant... Recording multiple HD streams into small memory buffers and then dumping the data out in small chunks to multiple files with fsyncs on every write is just brain-dead - certainly doing that on a striped RAID is going to cause problems. There is no need to fsync at all for something like mythtv - you just write to the file and it is the job of the kernel to make sure it makes it to disk. Without the fsync if the system crashes there is some risk that you'll lose a minute of video or whatever, but you're going to lose more than that while the system reboots anyway and isn't recording. In contrast, those fsyncs from small buffers are going to kill the drive and result in buffer overruns. I have no idea why the MythTV authors did it that way, and removing that fsync GREATLY improves performance. For whatever reason there seem to be people out there who think that you should fsync anytime you write to a file if you don't want to lose the data. Fsync should really only be used when you're dealing with cross-network/process transactions (such as in a database) to ensure that operations are consistent. You can go read the big long post by the ext3 maintainers about how what I just said is wrong, but then I'll just point you to the short post by Linus where he modified their filesystem code against their wishes so that it works just fine without all the fsyncs. Fsync just puts an unnecessary constraint on the OS - it tells the OS to forget all its smart buffer/cache management algorithms and just write this one piece of data right now no matter what, and when every process acts this selfishly it kills the performance of any drive configuration, but striped RAID even moreso. Ok, getting off the fsync soapbox... For most applications RAID5 is going to be perfectly fine, especially for things like MythTV that are writing large multimedia files. That said, the whole read-before-write issue in RAID is part of the reason that I switched to btrfs. The other issue is that I happened to run into parity errors which is a big problem with the design of mdadm RAID. If a drive outright fails mdadm handles that just fine. If for whatever reason data changes on one of the drives there is no way for it to know which n-1 combination of drives has the right data to rebuild the bad one. Btrfs and ZFS use block-level checksums so that it is always clear if any particular block is right or not, and then both will use redundant blocks if available to fix things. The COW approach also handles modifications to existing files better - in-place modifications of files are basically transactional and after a crash you'll either end up with the old data or the new data. I think ext3 also achieves that with Linus's change to the default behaviors (that was the part he got into a fight with the maintainers over - they wanted it to lose existing data by default if you didn't fsync after every write). Sorry about the rant. :) Rich ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug