Rich Freeman via plug on 6 Dec 2021 11:58:38 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Thoughts on MySQL from a developer


On Mon, Dec 6, 2021 at 1:43 PM Keith <kperry@daotechnologies.com> wrote:
>
> On 12/6/21 1:34 PM, Rich Freeman wrote:
> > On Mon, Dec 6, 2021 at 1:27 PM Keith via plug
> > <plug@lists.phillylinux.org> wrote:
> >> I don't generally do LVM snapshots but I do run nilfs2 volumes on systems where I need filesystem snapshotting.  I haven't tested that with Pg or MySQL since I handle such tasks at the system or VM level but I like the idea of using quiesce.  I'll have to read up more on that.
> >>
> > I suspect those instructions work fine for mysql/lvm, but I would
> > check the docs to ensure that there is an official way to ensure the
> > files are quiescent.  Just locking and flushing the tables doesn't
> > necessarily ensure the files are "clean," though it could.  I'd think
> > this would be fairly software-specific.
> Good point, though, if you can't trust a lock in a database, that's
> kinda problematic.  I would hope that that is not the case.

I'm not saying the locks don't work.  I'm saying that just holding a
lock doesn't necessarily guarantee that a database will start up
cleaning using the copy of the file on-disk.  All a lock guarantees is
that nobody else can modify things.

If you sync an ext4 filesystem and don't have any open files, then the
data on-disk won't be changing.  Even so, if you snapshot it and
report it, the filesystem will still be detected as not being clean on
the next mount, since it wasn't unmounted.  I'm not saying it isn't
"safe" - just it isn't clean.

In general you just have to be careful when bypassing the official
backup solutions.

> >
> > I would look at the database vendor/developer documentation for
> > supported backup methods.  This will ensure that your scripts don't
> > mysteriously break, probably between the time of disaster recovery
> > testing and an actual disaster.  Backup is usually a consideration
> > everybody looks at when choosing a database so any reputable product
> > will have plenty of documentation on how to do it in a supported way.
> > It is common that databases can be very large and take a long time to
> > dump, so they'll have some mechanism to do so safely.
>
> That's where log shipping comes in or what PostgreSQL calls PITR.  I
> haven't played around with that either since my databases are reasonable
> to dump.

I don't think we're talking about the same thing.  I think everybody
here is getting focused on how the database stores its data on disk.
Most database backup solutions provide a way to extract the data which
doesn't depend on how the data is actually stored.

Usually you just produce a big SQL script that if played re-creates
the entire database.

Now, the database obviously uses transaction logs and so on
internally.  However, you can do a consistent backup of a database
that exists only in RAM and which doesn't support replication or
anything like that.

Log shipping is something else.  You can use it to maintain
replicas/etc, but a backup and a replica aren't exactly the same
thing.  You could probably use a replica as a type of backup though.
I imagine that with log-shipping the logs are checkpointed in a way
that ensures the data is always consistent.

> I wouldn't mind seeing how MySQL handles such a case though.

So, I went ahead and stopped being lazy.

It appears that mysqldump only generates atomic dumps with InnoDB
tables and --single-transaction, or --lock-tables and any table type
(though the lock is only per-database so each database might be
inconsistent with the others).  You can also do locks at the
individual table level which is even less consistent.  Otherwise it
doesn't look like mysqldump provides any consistency guarantees.  If
you just use InnoDB and use --single-transaction though it should
probably behave like you'd expect such a tool to work (and it won't
kill other database activity like a lock would).

It isn't so much that your database needs to be huge to have problems.
It is just that you end up getting modifications to data during the
time the dump is created.  If you do an update query that changes 10
rows in a large table, without any kind of locks or transactions you
could end up with a backup that has 5 rows with the old value and 5
rows with the new value.  With a lock the modification would be
delayed until the backup is complete.  With a transaction the
modification would not be delayed, but the backup would see all 10
rows in their original state if it started before the update was run.

The lvm snapshot approach that was posted would be like having
transaction support for all databases with a very short period of
locking, but it may not be very portable/etc, and that approach is
mysql-specific and may or may not work in future versions in quite the
same way.

-- 
Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug