Re: Add recovery to pg_control and remove backup_label - Mailing list pgsql-hackers
From | David Steele |
---|---|
Subject | Re: Add recovery to pg_control and remove backup_label |
Date | |
Msg-id | 188e97f4-69d9-4542-b0c1-852fa6b8319b@pgmasters.net Whole thread Raw |
In response to | Re: Add recovery to pg_control and remove backup_label (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Add recovery to pg_control and remove backup_label
|
List | pgsql-hackers |
On 11/21/23 12:41, Andres Freund wrote: > > On 2023-11-21 07:42:42 -0400, David Steele wrote: >> On 11/20/23 19:58, Andres Freund wrote: >>> On 2023-11-21 08:52:08 +0900, Michael Paquier wrote: >>>> On Mon, Nov 20, 2023 at 12:37:46PM -0800, Andres Freund wrote: >>>>> Given that, I wonder if what we should do is to just add a new field to >>>>> pg_control that says "error out if backup_label does not exist", that we set >>>>> when creating a streaming base backup >>>> >>>> That would mean that one still needs to take an extra step to update a >>>> control file with this byte set, which is something you had a concern >>>> with in terms of compatibility when it comes to external backup >>>> solutions because more steps are necessary to take a backup, no? >>> >>> I was thinking we'd just set it in the pg_basebackup style path, and we'd >>> error out if it's set and backup_label is present. But we'd still use >>> backup_label without the pg_control flag set. >>> >>> So it'd just provide a cross-check that backup_label was not removed for >>> pg_basebackup style backup, but wouldn't do anything for external backups. But >>> imo the proposal to just us pg_control doesn't actually do anything for >>> external backups either - which is why I think my proposal would achieve as >>> much, for a much lower price. >> >> I'm not sure why you think the patch under discussion doesn't do anything >> for external backups. It provides the same benefits to both pg_basebackup >> and external backups, i.e. they both receive the updated version of >> pg_control. > > Sure. They also receive a backup_label today. If an external solution forgets > to replace pg_control copied as part of the filesystem copy, they won't get an > error after the remove of backup_label, just like they don't get one today if > they don't put backup_label in the data directory. Given that users don't do > the right thing with backup_label today, why can we rely on them doing the > right thing with pg_control? I think reliable backup software does the right thing with backup_label, but if the user starts getting errors on recovery they the decide to remove backup_label. I know we can't do much about bad backup software, but we can at least make this a bit more resistant to user error after the fact. It doesn't help that one of our hints suggests removing backup_label. In highly automated systems, the user might not even know they just restored from a backup. They are only in the loop because the restore failed and they are trying to figure out what is going wrong. When they remove backup_label the cluster comes up just fine. Victory! This is the scenario I've seen most often -- not the backup/restore process getting it wrong but the user removing backup_label on their own initiative. And because it yields such a positive result, at least initially, they remember in the future that the thing to do is to remove backup_label whenever they see the error. If they only have pg_control, then their only choice is to get it right or run pg_resetwal. Most users have no knowledge of pg_resetwal so it will take them longer to get there. Also, I think that tool make it pretty clear that corruption will result and the only thing to do is a logical dump and restore after using it. There are plenty of ways a user can mess things up. What I'd like to prevent is the appearance of everything being OK when in fact they have corrupted their cluster. That's the situation we have now with backup_label. Is this new solution perfect? No, but I do think it checks several boxes, and is a worthwhile improvement. Regards, -David Regards, -David
pgsql-hackers by date: