Re: Requiring recovery.signal or standby.signal when recovering with a backup_label - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Requiring recovery.signal or standby.signal when recovering with a backup_label |
Date | |
Msg-id | ZUBM6BNQnEh7lzIM@paquier.xyz Whole thread Raw |
In response to | Re: Requiring recovery.signal or standby.signal when recovering with a backup_label (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Requiring recovery.signal or standby.signal when recovering with a backup_label
|
List | pgsql-hackers |
On Mon, Oct 30, 2023 at 01:55:13PM -0400, Robert Haas wrote: > I would encourage some caution here. Thanks for chiming here. > In a vacuum, I'm in favor of this, and for the same reasons as you, > namely, that the huge pile of Booleans that we use to control recovery > is confusing, and it's difficult to make sure that all the code paths > are adequately tested, and I think some of the things that actually > work here are not documented. Yep, same feeling here. > But in practice, I think there is a possibility of something like this > backfiring very hard. Notice that the first two people who commented > on the thread saw the error and immediately removed backup_label even > though that's 100% wrong. It shows how utterly willing users are to > remove backup_label for any reason or no reason at all. If we convert > cases where things would have worked into cases where people nuke > backup_label and then it appears to work, we're going to be worse off > in the long run, no matter how crazy the idea of removing backup_label > may seem to us. As far as I know, there's one paragraph in the docs that implies this mode without giving an actual hint that this may be OK or not, so shrug: https://www.postgresql.org/docs/devel/continuous-archiving.html#BACKUP-TIPS "As with base backups, the easiest way to produce a standalone hot backup is to use the pg_basebackup tool. If you include the -X parameter when calling it, all the write-ahead log required to use the backup will be included in the backup automatically, and no special action is required to restore the backup." And a few lines down we imply to use restore_command, something that we check is set only if recovery.signal is set. See additionally validateRecoveryParameters(), where the comments imply that InArchiveRecovery would be set only when there's a restore command. As you're telling me, and I've considered that as an option as well, perhaps we should just consider the presence of a backup_label file with no .signal files as a synonym of crash recovery? In the recovery path, currently the essence of the problem is when we do InArchiveRecovery=true, but ArchiveRecoveryRequested=false, meaning that it should do archive recovery but we don't want it, and that does not really make sense. The rest of the code sort of implies that this is not a suported combination. So basically, my suggestion here, is to just replay WAL up to the end of what's in your local pg_wal/ and hope for the best, without TLI jumps, except that we'd do nothing. Doing a pg_basebackup -X stream followed by a restart would work fine with that, because all the WAL is here. A point of contention is if we'd better be stricter about satisfying backupEndPoint in such a case, but the redo code only wants to do something here when ArchiveRecoveryRequested is set (aka there's a .signal file set), and we would not want a TLI jump at the end of recovery, so I don't see an argument with caring about backupEndPoint in this case. At the end, I'm OK as long as ArchiveRecoveryRequested=false InArchiveRecovery=true does not exist anymore, because it's much easier to get what's going on with the redo path, IMHO. (I have a patch at hand to show the idea, will post it with a reply to Andres' message.) > Also, Andres just recently mentioned to me that he uses this procedure > of starting a server with a backup_label but no recovery.signal or > standby.signal file regularly, and thinks other people do too. I was > surprised, since I've never done that, except maybe when I was a noob > and didn't have a clue. But Andres is far from a noob. At this stage, that's basically at your own risk, as the code thinks it's OK to force what's basically archive-recovery-without-being-it. So it basically works, but it can also easily backfire, as well.. -- Michael
Attachment
pgsql-hackers by date: