Re: trying again to get incremental backup - Mailing list pgsql-hackers

From David Steele
Subject Re: trying again to get incremental backup
Date
Msg-id 11b38a96-6ded-4668-b772-40f992132797@pgmasters.net
Whole thread Raw
In response to Re: trying again to get incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: trying again to get incremental backup
List pgsql-hackers
On 10/23/23 11:44, Robert Haas wrote:
> On Fri, Oct 20, 2023 at 11:30 AM David Steele <david@pgmasters.net> wrote:
>>
>> I don't plan to stand in your way on this feature. I'm reviewing what
>> patches I can out of courtesy and to be sure that nothing adjacent to
>> your work is being affected. My apologies if my reviews are not meeting
>> your expectations, but I am contributing as my time constraints allow.
> 
> Sorry, I realize reading this response that I probably didn't do a
> very good job writing that email and came across sounding like a jerk.
> Possibly, I actually am a jerk. Whether it just sounded like it or I
> actually am, I apologize. 

That was the way it came across, though I prefer to think it was 
unintentional. I certainly understand how frustrating dealing with a 
large and uncertain patch can be. Either way, I appreciate the apology.

Now onward...

> But your last paragraph here gets at my real
> question, which is whether you were going to try to block the feature.
> I recognize that we have different priorities when it comes to what
> would make most sense to implement in PostgreSQL, and that's OK, or at
> least, it's OK with me. 

This seem perfectly natural to me.

> I also don't have any particular expectation
> about how much you should review the patch or in what level of detail,
> and I have sincerely appreciated your feedback thus far. If you are
> able to continue to provide more, that's great, and if that's not,
> well, you're not obligated. What I was concerned about was whether
> that review was a precursor to a vigorous attempt to keep the main
> patch from getting committed, because if that was going to be the
> case, then I'd like to surface that conflict sooner rather than later.
> It sounds like that's not an issue, which is great.

Overall I would say I'm not strongly for or against the patch. I think 
it will be very difficult to use in a manual fashion, but automation is 
they way to go in general so that's not necessarily and argument against.

However, this is an area of great interest to me so I do want to at 
least make sure nothing is being impacted adjacent to the main goal of 
this patch. So far I have seen no sign of that, but that has been a 
primary goal of my reviews.

> At the risk of drifting into the fraught question of what I *should*
> be implementing rather than the hopefully-less-fraught question of
> whether what I am actually implementing is any good, I see incremental
> backup as a way of removing some of the use cases for the low-level
> backup API. If you said "but people still will have lots of reasons to
> use it," I would agree; and if you said "people can still screw things
> up with pg_basebackup," I would also agree. Nonetheless, most of the
> disasters I've personally seen have stemmed from the use of the
> low-level API rather than from the use of pg_basebackup, though there
> are exceptions. 

This all makes sense to me.

> I also think a lot of the use of the low-level API is
> driven by it being just too darn slow to copy the whole database, and
> incremental backup can help with that in some circumstances. 

I would argue that restore performance is *more* important than backup 
performance and this patch is a step backward in that regard. Backups 
will be faster and less space will be used in the repository, but 
restore performance is going to suffer. If the deltas are very small the 
difference will probably be negligible, but as the deltas get large (and 
especially if there are a lot of them) the penalty will be more noticeable.

> Also, I
> have worked fairly hard to try to make sure that if you misuse
> pg_combinebackup, or fail to use it altogether, you'll get an error
> rather than silent data corruption. I would be interested to hear
> about scenarios where the checks that I've implemented can be defeated
> by something that is plausibly described as stupidity rather than
> malice. I'm not sure we can fix all such cases, but I'm very alert to
> the horror that will befall me if user error looks exactly like a bug
> in the code. For my own sanity, we have to be able to distinguish
> those cases. 

I was concerned with the difficulty of trying to stage the correct 
backups for pg_combinebackup, not whether it would recognize that the 
needed data was not available and then error appropriately. The latter 
is surmountable within pg_combinebackup but the former is left up to the 
user.

> Moreover, we also need to be able to distinguish
> backup-time bugs from reassembly-time bugs, which is why I've got the
> pg_walsummary tool, and why pg_combinebackup has the ability to emit
> fairly detailed debugging output. I anticipate those things being
> useful in investigating bug reports when they show up. I won't be too
> surprised if it turns out that more work on sanity-checking and/or
> debugging tools is needed, but I think your concern about people
> misusing stuff is bang on target and I really want to do whatever we
> can to avoid that when possible and detect it when it happens.

The ability of users to misuse tools is, of course, legendary, so that 
all sounds good to me.

One note regarding the patches. I feel like 
v5-0005-Prototype-patch-for-incremental-backup should be split to have 
the WAL summarizer as one patch and the changes to base backup as a 
separate patch.

It might not be useful to commit one without the other, but it would 
make for an easier read. Just my 2c.

Regards,
-David



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: post-recovery amcheck expectations
Next
From: David Steele
Date:
Subject: Re: Add trailing commas to enum definitions