Re: The danger of deleting backup_label - Mailing list pgsql-hackers

From David Steele
Subject Re: The danger of deleting backup_label
Date
Msg-id 65825be1-e79a-46f4-9d9f-4ff95a10e378@pgmasters.net
Whole thread Raw
In response to Re: The danger of deleting backup_label  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: The danger of deleting backup_label
List pgsql-hackers
Hi Thomas,

On 10/11/23 18:10, Thomas Munro wrote:
> 
> Even though I spent a whole bunch of time trying to figure out how to
> make concurrent reads of the control file sufficiently atomic for
> backups (pg_basebackup and low level filesystem tools), and we
> explored multiple avenues with varying results, and finally came up
> with something that basically works pretty well... actually I just
> hate all of that stuff, and I'm hoping to be able to just withdraw
> https://commitfest.postgresql.org/45/4025/ and chalk it all up to
> discovery/education and call *this* thread the real outcome of that
> preliminary work.
> 
> So I'm +1 on the idea of putting a control file image into the backup
> label and I'm happy that you're looking into it.

Well, hopefully this thread will *at least* be the solution going 
forward. Not sure about a back patch yet, see below...

> We could just leave the control file out of the base backup
> completely, as you said, removing a whole foot-gun.  

That's the plan.

> People following
> the 'low level' instructions will still get a copy of the control file
> from the filesystem, and I don't see any reliable way to poison that
> file without also making it so that a crash wouldn't also be prevented
> from recovering.  I have wondered about putting extra "fingerprint"
> information into the control file such as the file's path and inode
> number etc, so that you can try to distinguish between a control file
> written by PostgreSQL, and a control file copied somewhere else, but
> that all feels too fragile, and at the end of the day, people
> following the low level backup instructions had better follow the low
> level backup instructions (hopefully via the intermediary of an
> excellent external backup tool).

Not sure about the inode idea, because it seems OK for people to move a 
cluster elsewhere under a variety of circumstances. I do have an idea 
about how to mark a cluster in "recovery to consistency" mode, but not 
quite sure how to atomically turn that off at the end of recovery to 
consistency. I have some ideas I'll work on though.

> As Stephen mentioned[1], we could perhaps also complain if both backup
> label and control file exist, and then hint that the user should
> remove the *control file* (not the backup label!).  I had originally
> suggested we would just overwrite the control file, but by explicitly
> complaining about it we would also bring the matter to tool/script
> authors' attention, ie that they shouldn't be backing that file up, or
> should be removing it in a later step if they copy everything.  He
> also mentions that there doesn't seem to be anything stopping us from
> back-patching changes to the backup label contents if we go this way.
> I don't have a strong opinion on that and we could leave the question
> for later.

I'm worried about the possibility of back patching this unless the 
solution comes out to be simpler than I think and that rarely comes to 
pass. Surely throwing errors on something that is currently valid (i.e. 
backup_label and pg_control both present).

But perhaps there is a simpler, acceptable solution we could back patch 
(transparent to all parties except Postgres) and then a more advanced 
solution we could go forward with.

I guess I had better get busy on this.

Regards,
-David

[1] 
https://www.postgresql.org/message-id/ZL69NXjCNG%2BWHCqG%40tamriel.snowman.net



pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
Next
From: Nikita Malakhov
Date:
Subject: Pro et contra of preserving pg_proc oids during pg_upgrade