Re: block-level incremental backup - Mailing list pgsql-hackers

From Robert Haas
Subject Re: block-level incremental backup
Date
Msg-id CA+TgmoYH4T_BH0wmDCCvfdaA6jdq8h-5bOWL46HQsP1DC2mfig@mail.gmail.com
Whole thread Raw
In response to Re: block-level incremental backup  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Wed, Apr 24, 2019 at 12:57 PM Stephen Frost <sfrost@snowman.net> wrote:
> So, I had a thought about that when I was composing the last email and
> while I'm still unsure about it, maybe it'd be useful to mention it
> here- do we really need a list of every *file*, or could we reduce that
> down to a list of relations + forks for the main data directory, and
> then always include whatever other directories/files are appropriate?

I'm not quite sure what the difference is here.  I agree that we could
try to compact the list of file names by saying 16384 (24 segments)
instead of 16384, 16384.1, ..., 16384.23, but I doubt that saves
anything meaningful.  I don't see how we can leave anything out
altogether.  If there's a filename called boaty.mcboatface in the
server directory, I think we've got to back it up, and that won't
happen unless the client knows that it is there, and it won't know
unless we include it in a list.

> When it comes to operating in chunks, well, if we're getting a list of
> relations instead of files, we do have this thing called cursors..

Sure... but they don't work for replication commands and I am
definitely not volunteering to change that.

> I would think the client would be able to just ask for the list of
> modified files, when it comes to building up the list of files to ask
> for, which could potentially be done based on mtime instead of by WAL
> scanning or by scanning the files themselves.  Don't get me wrong, I'd
> prefer that we work based on the WAL, since I have more confidence in
> that, but certainly quite a few of the tools do work off mtime these
> days and while it's not perfect, the risk/reward there is pretty
> palatable to a lot of people.

That approach, as with a few others that have been suggested, requires
that the client have access to the previous backup, which makes me
uninterested in implementing it.  I want a version of incremental
backup where the client needs to know the LSN of the previous backup
and nothing else.  That way, if you store your actual backups on a
tape drive in an airless vault at the bottom of the Pacific Ocean, you
can still take incremental backup against them, as long as you
remember to note the LSNs before you ship the backups to the vault.
Woohoo!  It also allows for the wire protocol to be very simple and
the client to be very simple; neither of those things is essential,
but both are nice.

Also, I think using mtimes is just asking to get burned.  Yeah, almost
nobody will, but an LSN-based approach is more granular (block level)
and more reliable (can't be fooled by resetting a clock backward, or
by a filesystem being careless with file metadata), so I think it
makes sense to focus on getting that to work.  It's worth keeping in
mind that there may be somewhat different expectations for an external
tool vs. a core feature.  Stupid as it may sound, I think people using
an external tool are more likely to do things read the directions, and
those directions can say things like "use a reasonable filesystem and
don't set your clock backward."  When stuff goes into core, people
assume that they should be able to run it on any filesystem on any
hardware where they can get it to work and it should just work.  And
you also get a lot more users, so even if the percentage of people not
reading the directions were to stay constant, the actual number of
such people will go up a lot. So picking what we seem to both agree to
be the most robust way of detecting changes seems like the way to go
from here.

> I suspect some of that's driven by how they get solved and if we decide
> we have to solve all of them.  With things like MAX_RATE + incremental
> backups, I wonder how that's going to end up working, when you have the
> option to apply the limit to the network, or to the disk I/O.  You might
> have addressed that elsewhere, I've not looked, and I'm not too
> particular about it personally either, but a definition could be "max
> rate at which we'll read the file you asked for on this connection" and
> that would be pretty straight-forward, I'd think.

I mean, it's just so people can tell pg_basebackup what rate they want
via a command-line option and have it happen like that.  They don't
care about the rates for individual files.

> > Issue #1: If you manually add files to your backup, remove files from
> > your backup, or change files in your backup, bad things will happen.
> > There is fundamentally nothing we can do to prevent this completely,
> > but it may be possible to make the system more resilient against
> > ham-handed modifications, at least to the extent of detecting them.
> > That's maybe a topic for another thread, but it's an interesting one:
> > Andres and I were brainstorming about it at some point.
>
> I'd certainly be interested in hearing about ways we can improve on
> that.  I'm alright with it being on another thread as it's a broader
> concern than just what we're talking about here.

Might be a good topic to chat about at PGCon.

> > Issue #2: You can only restore an LSN-based incremental backup
> > correctly if you have a base backup whose start-of-backup LSN is
> > greater than or equal to the threshold LSN used to take the
> > incremental backup.  If #1 is not in play, this is just a simple
> > cross-check at restoration time: retrieve the 'START WAL LOCATION'
> > from the prior backup's backup_label file and the threshold LSN for
> > the incremental backup from wherever you decide to store it and
> > compare them; if they do not have the right relationship, ERROR.  As
> > to whether #1 might end up in play here, anything's possible, but
> > wouldn't manually editing LSNs in backup metadata files be pretty
> > obviously a bad idea?  (Then again, I didn't really think the whole
> > backup_label thing was that confusing either, and obviously I was
> > wrong about that.  Still, editing a file requires a little more work
> > than removing it... you have to not only lie to the system, you have
> > to decide which lie to tell!)
>
> Yes, that'd certainly be at least one cross-check, but what if you've
> got an incremental backup based on a prior incremental backup that's
> based on a prior full, and you skip the incremental backup inbetween
> somehow?  Or are we just going to state outright that we don't support
> incremental-on-incremental (in which case, all backups would actually be
> either 'full' or 'differential' in the pgBackRest parlance, anyway, and
> that parlance comes from my recollection of how other tools describe the
> different backup types, but that was from many moons ago and might be
> entirely wrong)?

I have every intention of supporting that case, just as I described in
my original email, and the algorithm that I just described handles it.
You just have to repeat the checks for every backup in the chain.   If
you have a backup A, and a backup B intended as an incremental vs. A,
and a backup C intended as an incremental vs. B, then the threshold
LSN for C is presumably the starting LSN for B, and the threshold LSN
for B is presumably the starting LSN for A.  If you try to restore
A-B-C you'll check C vs. B and find that all is well and similarly for
B vs. A.  If you try to restore A-C, you'll find out that A's start
LSN precedes C's threshold LSN and error out.

> Even if what we're talking about here is really only "differentials", or
> backups where the incremental contains all the changes from a prior full
> backup, if the only check is "full LSN is greater than or equal to the
> incremental backup LSN", then you have a potential problem that's larger
> than just the incrementals no longer being valid because you removed the
> full backup on which they were taken- you might think that an *earlier*
> full backup is the one for a given incremental and perform a restore
> with the wrong full/incremental matchup and end up with a corrupted
> database.

No, the proposed check is explicitly designed to prevent that.  You'd
get a restore failure (which is not great either, of course).

> management.  The idea of implementing block-level incrementals while
> pushing the backup management, expiration, and dependency between
> incrementals and fulls on to the user to figure out just strikes me as
> entirely backwards and, frankly, to be gratuitously 'itch scratching' at
> the expense of what users really want and need here.

Well, not everybody needs or wants the same thing.  I wouldn't be
proposing it if my employer didn't think it was gonna solve a real
problem...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: POC: Cleaning up orphaned files using undo logs
Next
From: Antonin Houska
Date:
Subject: Re: Remove page-read callback from XLogReaderState.