Re: Improving Physical Backup/Restore within the Low Level API - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Improving Physical Backup/Restore within the Low Level API
Date
Msg-id CAKFQuwbdOBh+4xTmK3d1+27M9rrU1je4-=Ye9xo5tiWfFb_HoA@mail.gmail.com
Whole thread Raw
In response to Re: Improving Physical Backup/Restore within the Low Level API  (Laurenz Albe <laurenz.albe@cybertec.at>)
Responses Re: Improving Physical Backup/Restore within the Low Level API
List pgsql-hackers
On Mon, Oct 16, 2023 at 10:26 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Mon, 2023-10-16 at 09:26 -0700, David G. Johnston wrote:
> This email is a first pass at a user-visible design for how our backup and restore
> process, as enabled by the Low Level API, can be modified to make it more mistake-proof.
> In short, it requires pg_start_backup to further expand upon what it means for the
> system to be in the midst of a backup, pg_stop_backup to reverse those things,
> and modifying the startup process to deal with the server having crashed while the
> system is in that backup state.  Notes at the end extend the design to handle concurrent backups.
>
> The core functional changes are:
> 1) pg_backup_start modifies a newly added "in backup" state flag in pg_control to on.
> 2) pg_backup_stop modifies that flag back to off.
> 3) postmaster will refuse to start if that flag is on, unless one of:
>   a) crash.signal exists in the data directory
>   b) recovery.signal exists in the data directory
>   c) standby.signal exists in the data directory
> 4) Signal file processing causes the in-backup flag in pg_control to be set to off
>
> The newly added crash.signal file is required to handle the case where the server
> crashes after pg_backup_start and before pg_backup_stop.  It initiates a crash recovery
> of the instance just as is done today but with the added change of flipping the flag
> to off when recovery is complete just before going live.

I see a couple of problems and/or things that need clarification with that idea:

- Two backups can run concurrently.  How do you reconcile that with the "in backup"
  flag and crash.signal?
- I guess crash.signal is created during pg_start_backup().  So that file will be
  included in the backup.  How do you handle that during recovery?  Ignore it if
  another signal file is present?  And if the user forgets to create a signal file
  for recovery, how do you prevent PostgreSQL from performing crash recovery?


crash.signal is created in the pg_backup_metadata directory, not the root directory.  Should the server crash while any backup is in progress pg_control would be aware of that fact (in_backup=true would still be there, instead of in_backup=false which only comes back after all backups have completed) and the server will not restart without user intervention - specifically, moving the crash.signal file from (one of) the pg_backup_metadata subdirectories to the root directory.  As there is nothing special about the crash.signal files in the pg_backup_metadata subdirectories "touch crash.signal" could be used.

The backed up pg_control file will have in_backup=true (I haven't pondered the torn reads dynamic of this - I'm supposing that placing a copy of pg_control into the pg_backup_metadata directory might be part of solving that problem).

David J.

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound
Next
From: Robert Haas
Date:
Subject: Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression