Re: BUG #17744: Fail Assert while recoverying from pg_basebackup - Mailing list pgsql-bugs
From | Michael Paquier |
---|---|
Subject | Re: BUG #17744: Fail Assert while recoverying from pg_basebackup |
Date | |
Msg-id | Y/wrERSMTIdCuy9N@paquier.xyz Whole thread Raw |
In response to | Re: BUG #17744: Fail Assert while recoverying from pg_basebackup (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
List | pgsql-bugs |
On Mon, Feb 27, 2023 at 12:01:01PM +0900, Kyotaro Horiguchi wrote: > In my understanding, the approach differs slightly from the OP's > case. While server reaches consistency state early on using this > steps, in the OP's case it takes much longer time to reach > consistency, to the point where a checkpoint occurs before it is > reached. It seems that the approach taken by the patch does not work > in the OP's case, where (I guess that) hot-standby is disabled. Yep, hot standby is disabled in this case. Now note that we go through the hot-standby initialization based on (ArchiveRecoveryRequested && EnableHotStandby). The first one is true when we find a recovery.signal or standby.signal. The second one is the hot_standby GUC, hence recovery.signal enforces the initialization even if we know that we won't allow standby connections anyway. This is a waste, surely, if you only have recovery.signal, still it does not hurt? > As you know, pg_basebackup has the -R option. It seems to me that the > existence means we conciously allowed the backup to start without > archive recovery. The OP's operation (if it is correct) involves > taking a base backup and treating it as a literal backup, meaning it > begins its life on the origial timeline. This change prevents an > operation that we may have implicitly allowed (at least, that is my > understanding). Note that pg_basebackup -R enforces a standby.signal, not a recovery.signal. I am questioning whether we have to support the case of this thread, to be honest, particularly if it means simplifying the recovery logic that's now behind the two signal files and the backup_label. That's not a behavior that should be enforced in a stable branch, perhaps, but if it means removing some of the boolean flags controlling the recovery behavior, simplifying its logic, then I see a pretty good argument with doing a move in this direction if it eases the maintenance of this code with less cases to think about. We have too many of them already and these have caused us many issues in recent history (continuation records, for one), so I would draw a clear line and simplify things. Also, note "Recovering Using a Continuous Archive Backup", where we basically say that one *has* to put a recovery.signal when recovering using a base backup in one of the steps, which enforces a new timeline once recovery is done with a base backup. What I have sent upthread enforces that more in the presence of a backup_label file without recovery.signal, which may be too aggressive, perhaps, because it causes always a TLI jump, still it avoids the need of a hard failure to always require a recovery.signal or standby.signal if we are recovering from a base backup. If we were to support the case where we have (InArchiveRecovery && !ArchiveRecoveryRequested) during recovery, we'd better put a clear definition behind it. Personnally, I don't know what it means based on what is documented in xlogrecovery.c or xlog.c (echoing with [1]). > Thus, I slightly object to let backup_label to enforce archive > recovery. And rather prefer to suppress checkpoints before reaching > consistency(?). Suppressing restartpoints is not enough to completely close the issue. With the case I am mentioning above, it is possible to run restartpoints after consistency has been reached without the hot standby initialization done, causing the same failure when attempting to truncate pg_subtrans/ in a restart point :/ [1]: https://www.postgresql.org/message-id/20230201153252.l6kcfum7trdovw2b@alap3.anarazel.de -- Michael
Attachment
pgsql-bugs by date: