Re: Return pg_control from pg_backup_stop(). - Mailing list pgsql-hackers
| From | David Steele |
|---|---|
| Subject | Re: Return pg_control from pg_backup_stop(). |
| Date | |
| Msg-id | 3b23e3b7-53d2-4784-b482-05cca3327acb@pgbackrest.org Whole thread Raw |
| In response to | Re: Return pg_control from pg_backup_stop(). (Michael Paquier <michael@paquier.xyz>) |
| Responses |
Re: Return pg_control from pg_backup_stop().
|
| List | pgsql-hackers |
On 3/18/26 11:53, Michael Paquier wrote: > On Wed, Mar 18, 2026 at 04:05:24AM +0000, David Steele wrote: >> On 3/18/26 08:43, Michael Paquier wrote: >>> I was wondering if we should have an assertion at least to cross-check >>> that the contents we store in shared memory never go out-of-sync with >>> the on-disk contents, in the shape of a USE_ASSERT_CHECKING block that >>> calls get_controlfile() and memcmp()'s the contents between shmem and >>> the on-disk file, while the LWLock is taken. We ship the control file >>> last on purpose, one reason being backups taken from standbys, so that >>> may be sensible to do. >> >> As far as I can see this should always be true -- I audited all the >> >> LWLockAcquire(ControlFileLock, LW_EXCLUSIVE) >> >> sections and the file is always saved once if is updated. Let me see if I >> can add this check without too much pain, e.g. an additional parameter. > > This matches with my reads of the code. The attached check, that can > be applied on top of your patches, passes under check-world. Looks good to me. I have integrated it into the attached patches. I just changed it to look at our copy instead of the global ControlData since that is the one we are going to save. >> Using the pg_control copy from pg_backup_stop() is entirely optional and >> nothing is broken if vendors decide not to use it. This can be demonstrated >> by applying the 01 patch without 02. In that case the tests in >> 042_low_level_backup continue to run. You can also apply 01 and 02 and >> revert the test changes in 042_low_level_backup and that works, too. > > FWIW, after a second look I am actually wondering if 0002 is safe at > all. The contents of the control file are fetched after we are done > with do_pg_backup_stop(), and there could be a bunch of activity that > happens between the end of do_pg_backup_stop() and the > backup_control_file() call, where the control file could be updated > more and interfere with the recovery startup for some of its fields? > GUC parameter updates that may touch the control file are one thing > popping into mind. You are correct -- the copy of pg_control needs to happen before do_pg_backup_stop(). An older version of this patch saved pg_control in backup_state which made the prior location safe. However, I missed moving this code when I moved pg_control out of backup_state. Code review to the rescue. I believe I have addressed all current review comments in the attached patches. I tested Postgres crashing right after pg_control is updated but before backup_label is renamed. It worked as expected on restart. I also manually removed backup_label before restarting and that worked as well. I wrote a comment documenting all that. One thing I could do is note in the documentation that it is not strictly necessary to get pg_control from pg_backup_stop(). Right now it sounds like copying from disk is no longer an option -- but it is if you are willing to accept the possibility of pg_control being torn. But I'd hope most well-maintained backup software is taking care of that by now. The problem with caveats in the docs is it can lead to confusion and getting pg_control from pg_backup_stop() is just a better idea in general. Thoughts? Regards, -David
Attachment
pgsql-hackers by date: