Re: Return pg_control from pg_backup_stop(). - Mailing list pgsql-hackers

From David Steele
Subject Re: Return pg_control from pg_backup_stop().
Date
Msg-id 3b23e3b7-53d2-4784-b482-05cca3327acb@pgbackrest.org
Whole thread Raw
In response to Re: Return pg_control from pg_backup_stop().  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Return pg_control from pg_backup_stop().
List pgsql-hackers
On 3/18/26 11:53, Michael Paquier wrote:
> On Wed, Mar 18, 2026 at 04:05:24AM +0000, David Steele wrote:
>> On 3/18/26 08:43, Michael Paquier wrote:
>>> I was wondering if we should have an assertion at least to cross-check
>>> that the contents we store in shared memory never go out-of-sync with
>>> the on-disk contents, in the shape of a USE_ASSERT_CHECKING block that
>>> calls get_controlfile() and memcmp()'s the contents between shmem and
>>> the on-disk file, while the LWLock is taken.  We ship the control file
>>> last on purpose, one reason being backups taken from standbys, so that
>>> may be sensible to do.
>>
>> As far as I can see this should always be true -- I audited all the
>>
>> LWLockAcquire(ControlFileLock, LW_EXCLUSIVE)
>>
>> sections and the file is always saved once if is updated. Let me see if I
>> can add this check without too much pain, e.g. an additional parameter.
> 
> This matches with my reads of the code.  The attached check, that can
> be applied on top of your patches, passes under check-world.

Looks good to me. I have integrated it into the attached patches. I just 
changed it to look at our copy instead of the global ControlData since 
that is the one we are going to save.

>> Using the pg_control copy from pg_backup_stop() is entirely optional and
>> nothing is broken if vendors decide not to use it. This can be demonstrated
>> by applying the 01 patch without 02. In that case the tests in
>> 042_low_level_backup continue to run. You can also apply 01 and 02 and
>> revert the test changes in 042_low_level_backup and that works, too.
> 
> FWIW, after a second look I am actually wondering if 0002 is safe at
> all.  The contents of the control file are fetched after we are done
> with do_pg_backup_stop(), and there could be a bunch of activity that
> happens between the end of do_pg_backup_stop() and the
> backup_control_file() call, where the control file could be updated
> more and interfere with the recovery startup for some of its fields?
> GUC parameter updates that may touch the control file are one thing
> popping into mind.

You are correct -- the copy of pg_control needs to happen before 
do_pg_backup_stop(). An older version of this patch saved pg_control in 
backup_state which made the prior location safe. However, I missed 
moving this code when I moved pg_control out of backup_state. Code 
review to the rescue.

I believe I have addressed all current review comments in the attached 
patches. I tested Postgres crashing right after pg_control is updated 
but before backup_label is renamed. It worked as expected on restart. I 
also manually removed backup_label before restarting and that worked as 
well. I wrote a comment documenting all that.

One thing I could do is note in the documentation that it is not 
strictly necessary to get pg_control from pg_backup_stop(). Right now it 
sounds like copying from disk is no longer an option -- but it is if you 
are willing to accept the possibility of pg_control being torn. But I'd 
hope most well-maintained backup software is taking care of that by now.

The problem with caveats in the docs is it can lead to confusion and 
getting pg_control from pg_backup_stop() is just a better idea in general.

Thoughts?

Regards,
-David

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: SQL Property Graph Queries (SQL/PGQ)
Next
From: Imran Zaheer
Date:
Subject: Re: [WIP] Pipelined Recovery