Thread: Requiring recovery.signal or standby.signal when recovering with a backup_label

Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Michael Paquier
Date:
Hi all,

This is a follow-up of the point I have made a few weeks ago on this
thread of pgsql-bugs about $subject:
https://www.postgresql.org/message-id/Y/Q/17rpYS7YGbIt@paquier.xyz
https://www.postgresql.org/message-id/Y/v0c+3W89NBT/if@paquier.xyz

Here is a short summary of what I think is incorrect, and what I'd
like to do to improve things moving forward, this pointing to a
simple solution..

While looking at the so-said thread, I have dug into the recovery code
to see what looks like an incorrect assumption behind the two boolean
flags named ArchiveRecoveryRequested and InArchiveRecovery that we
have in xlogrecovery.c to control the behavior of archive recovery in
the startup process.  For information, as of HEAD, these two are
described as follows:
/*
 * When ArchiveRecoveryRequested is set, archive recovery was requested,
 * ie. signal files were present.  When InArchiveRecovery is set, we are
 * currently recovering using offline XLOG archives.  These variables are only
 * valid in the startup process.
 *
 * When ArchiveRecoveryRequested is true, but InArchiveRecovery is false, we're
 * currently performing crash recovery using only XLOG files in pg_wal, but
 * will switch to using offline XLOG archives as soon as we reach the end of
 * WAL in pg_wal.
 */
bool       ArchiveRecoveryRequested = false;
bool       InArchiveRecovery = false;

When you read this text alone, its assumptions are simple.  When the
startup process finds a recovery.signal or a standby.signal, we switch
ArchiveRecoveryRequested to true.  If there is a standby.signal,
InArchiveRecovery would be immediately set to true before beginning
the redo loop.  If we begin redo with a recovery.signal,
ArchiveRecoveryRequested = true and InArchiveRecovery = false, crash
recovery happens first, consuming all the WAL in pg_wal/, then we'd
move on with archive recovery.

Now comes the problem of the other thread, which is what happens when
you use a backup_label *without* a recovery.signal or a
standby.signal.  In this case, as currently coded, it is possible to
enforce ArchiveRecoveryRequested = false and later InArchiveRecovery =
true.  Not setting ArchiveRecoveryRequested has a couple of undesired
effect.  First, this skips some initialization steps that may be
needed at a later point in recovery.  The thread quoted above has
reported one aspect of that: we miss some hot-standby related
intialization that can reflect if replaying enough WAL that a restart
point could happen.  Depending on the amount of data copied into
pg_wal/ before starting a node with only a backup_label it may also be
possible that a consistent point has been reached, where restart
points would be legit.  A second Kiss Cool effect (understands who
can), is that we miss the initialization of the recoveryWakeupLatch.
A third effect is that some code paths can use GUC values related to
recovery without ArchiveRecoveryRequested being around, one example
seems to be hot_standby whose default is true.

It is worth noting the end of FinishWalRecovery(), that includes this
part:
    if (ArchiveRecoveryRequested)
    {
        /*
         * We are no longer in archive recovery state.
         *
         * We are now done reading the old WAL.  Turn off archive fetching if
         * it was active.
         */
        Assert(InArchiveRecovery);
        InArchiveRecovery = false;

I have been pondering for a few weeks now about what kind of
definition would suit to a cluster having a backup_label file without
a signal file added, which is documented as required by the docs in
the HA section as well as pg_rewind.  It is true that there could be a
point to allow such a configuration so as a node recovers without a
TLI jump, but I cannot find appealing this case, as well, as a node
could just happily overwrite WAL segments in the archives on an
existing timeline, potentially corruption other nodes writing on the
same TLI.  There are a few other recovery scenarios where one copies
directly WAL segments into pg_wal/ that can lead to a lot of weird
inconsistencies as well, one being the report of the thread of
pgsql-hackers.

At the end, I'd like to think that we should just require
a recovery.signal or a standby.signal if we have a backup_label file,
and even enforce this rule at the end of recovery for some sanity
checks.  I don't think that we can just enforce
ArchiveRecoveryRequested in this path, either, as a backup_label would
be renamed to .old once the control file knows up to which LSN it
needs to replay to reach consistency and if an end-of-backup record is
required.  That's not something that can be reasonably backpatched, as
it could disturb some recovery workflows, even if these are kind of in
a dangerous spot, IMO, so I would like to propose that only on HEAD
for 16~ because the recovery code has never really considered this
combination of ArchiveRecoveryRequested and InArchiveRecovery.

While digging into that, I have found one TAP test of pg_basebackup
that was doing recovery with just a backup_label file, with a
restore_command already set.  A second case was in pg_rewind, were we
have a node without standby.signal, still it uses a primary_conninfo.

Attached is a patch on the lines of what I am thinking about.  This
reworks a bit some of the actions at the beginning of the startup
process:
- Report the set of LOGs showing the state of the node after reading
the backup_label.
- Enforce a rule in ShutdownWalRecovery() and document the
restriction.
- Add a check with the signal files after finding a backup_label
file.
- The validation checks on the recovery parameters are applied (aka
restore_command required with recovery.signal, or a primary_conninfo
required on standby for streaming, etc.).

My apologies for the long message, but this deserves some attention,
IMHO.

So, any thoughts?
--
Michael

Attachment
On Fri, Mar 10, 2023 at 03:59:04PM +0900, Michael Paquier wrote:
> My apologies for the long message, but this deserves some attention,
> IMHO.

Note: A CF entry has been added as of [1], and I have added an item in
the list of live issues on the open item page for 16.

[1]: https://commitfest.postgresql.org/43/4244/
[2]: https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items#Live_issues
--
Michael

Attachment

I believe before users can make a backup using pg_basebackup and then start the backup as an independent Primary server for whatever reasons. Now, if this is still allowed, then users need to be aware that the backup_label must be manually deleted, otherwise, the backup won't be able to start as a Primary.

The current message below doesn't provide such a hint.

+		if (!ArchiveRecoveryRequested)
+			ereport(FATAL,
+					(errmsg("could not find recovery.signal or standby.signal when recovering with backup_label"),
+					 errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" or \"%s/standby.signal\" and add required recovery options.",
+							 DataDir, DataDir)));
On 2023-03-12 6:06 p.m., Michael Paquier wrote:
On Fri, Mar 10, 2023 at 03:59:04PM +0900, Michael Paquier wrote:
My apologies for the long message, but this deserves some attention,
IMHO.
Note: A CF entry has been added as of [1], and I have added an item in
the list of live issues on the open item page for 16.

[1]: https://commitfest.postgresql.org/43/4244/
[2]: https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items#Live_issues
--
Michael

Best regards,

David

On Fri, Jul 14, 2023 at 01:32:49PM -0700, David Zhang wrote:
> I believe before users can make a backup using pg_basebackup and then start
> the backup as an independent Primary server for whatever reasons. Now, if
> this is still allowed, then users need to be aware that the backup_label
> must be manually deleted, otherwise, the backup won't be able to start as a
> Primary.

Delete a backup_label from a fresh base backup can easily lead to data
corruption, as the startup process would pick up as LSN to start
recovery from the control file rather than the backup_label file.
This would happen if a checkpoint updates the redo LSN in the control
file while a backup happens and the control file is copied after the
checkpoint, for instance.  If one wishes to deploy a new primary from
a base backup, recovery.signal is the way to go, making sure that the
new primary is bumped into a new timeline once recovery finishes, on
top of making sure that the startup process starts recovery from a
position where the cluster would be able to achieve a consistent
state.

> The current message below doesn't provide such a hint.
>
> +        if (!ArchiveRecoveryRequested)
> +            ereport(FATAL,
> +                    (errmsg("could not find
> recovery.signal or standby.signal when recovering with
> backup_label"),
> +                     errhint("If you are restoring
> from a backup, touch \"%s/recovery.signal\" or \"%s/standby.signal\"
> and add required recovery options.",
> +                             DataDir,
> DataDir)));

How would you rewrite that?  I am not sure how many details we want to
put here in terms of differences between recovery.signal and
standby.signal, still we surely should mention these are the two
possible choices.
--
Michael

Attachment
On 2023-07-16 6:27 p.m., Michael Paquier wrote:
>
> Delete a backup_label from a fresh base backup can easily lead to data
> corruption, as the startup process would pick up as LSN to start
> recovery from the control file rather than the backup_label file.
> This would happen if a checkpoint updates the redo LSN in the control
> file while a backup happens and the control file is copied after the
> checkpoint, for instance.  If one wishes to deploy a new primary from
> a base backup, recovery.signal is the way to go, making sure that the
> new primary is bumped into a new timeline once recovery finishes, on
> top of making sure that the startup process starts recovery from a
> position where the cluster would be able to achieve a consistent
> state.
Thanks a lot for sharing this information.
>
> How would you rewrite that?  I am not sure how many details we want to
> put here in terms of differences between recovery.signal and
> standby.signal, still we surely should mention these are the two
> possible choices.

Honestly, I can't convince myself to mention the backup_label here too. 
But, I can share some information regarding my testing of the patch and 
the corresponding results.

To assess the impact of the patch, I executed the following commands for 
before and after,

pg_basebackup -h localhost -p 5432 -U david -D pg_backup1

pg_ctl -D pg_backup1 -l /tmp/logfile start

Before the patch, there were no issues encountered when starting an 
independent Primary server.


However, after applying the patch, I observed the following behavior 
when starting from the base backup:

1) simply start server from a base backup

FATAL:  could not find recovery.signal or standby.signal when recovering 
with backup_label

HINT:  If you are restoring from a backup, touch 
"/media/david/disk1/pg_backup1/recovery.signal" or 
"/media/david/disk1/pg_backup1/standby.signal" and add required recovery 
options.

2) touch a recovery.signal file and then try to start the server, the 
following error was encountered:

FATAL:  must specify restore_command when standby mode is not enabled

3) touch a standby.signal file, then the server successfully started, 
however, it operates in standby mode, whereas the intended behavior was 
for it to function as a primary server.


Best regards,

David







On Wed, Jul 19, 2023 at 11:21:17AM -0700, David Zhang wrote:
> 1) simply start server from a base backup
>
> FATAL:  could not find recovery.signal or standby.signal when recovering
> with backup_label
>
> HINT:  If you are restoring from a backup, touch
> "/media/david/disk1/pg_backup1/recovery.signal" or
> "/media/david/disk1/pg_backup1/standby.signal" and add required recovery
> options.

Note the difference when --write-recovery-conf is specified, where a
standby.conf is created with a primary_conninfo in
postgresql.auto.conf.  So, yes, that's expected by default with the
patch.

> 2) touch a recovery.signal file and then try to start the server, the
> following error was encountered:
>
> FATAL:  must specify restore_command when standby mode is not enabled

Yes, that's also something expected in the scope of the v1 posted.
The idea behind this restriction is that specifying recovery.signal is
equivalent to asking for archive recovery, but not specifying
restore_command is equivalent to not provide any options to be able to
recover.  See validateRecoveryParameters() and note that this
restriction exists since the beginning of times, introduced in commit
66ec2db.  I tend to agree that there is something to be said about
self-contained backups taken from pg_basebackup, though, as these
would fail if no restore_command is specified, and this restriction is
in place before Postgres has introduced replication and easier ways to
have base backups.  As a whole, I think that there is a good argument
in favor of removing this restriction for the case where archive
recovery is requested if users have all their WAL in pg_wal/ to be
able to recover up to a consistent point, keeping these GUC
restrictions if requesting a standby (not recovery.signal, only
standby.signal).

> 3) touch a standby.signal file, then the server successfully started,
> however, it operates in standby mode, whereas the intended behavior was for
> it to function as a primary server.

standby.signal implies that the server will start in standby mode.  If
one wants to deploy a new primary, that would imply a timeline jump at
the end of recovery, you would need to specify recovery.signal
instead.

We need more discussions and more opinions, but the discussion has
stalled for a few months now.  In case, I am adding Thomas Munro in CC
who has mentioned to me at PGcon that he was interested in this patch
(this thread's problem is not directly related to the fact that the
checkpointer now runs in crash recovery, though).

For now, I am attaching a rebased v2.
--
Michael

Attachment
Thanks for the patch.

I rerun the test in
https://www.postgresql.org/message-id/flat/ZQtzcH2lvo8leXEr%40paquier.xyz#cc5ed83e0edc0b9a1c1305f08ff7a335
. We can discuss all the problems in this thread.

First I encountered the problem " FATAL:  could not find
recovery.signal or standby.signal when recovering with backup_label ",
then I deleted the backup_label file and started the instance
successfully.

> Delete a backup_label from a fresh base backup can easily lead to data
> corruption, as the startup process would pick up as LSN to start
> recovery from the control file rather than the backup_label file.
> This would happen if a checkpoint updates the redo LSN in the control
> file while a backup happens and the control file is copied after the
> checkpoint, for instance. If one wishes to deploy a new primary from
> a base backup, recovery.signal is the way to go, making sure that the
> new primary is bumped into a new timeline once recovery finishes, on
> top of making sure that the startup process starts recovery from a
> position where the cluster would be able to achieve a consistent
> state.

ereport(FATAL,
(errmsg("could not find redo location referenced by checkpoint record"),
errhint("If you are restoring from a backup, touch
\"%s/recovery.signal\" and add required recovery options.\n"
"If you are not restoring from a backup, try removing the file
\"%s/backup_label\".\n"
"Be careful: removing \"%s/backup_label\" will result in a corrupt
cluster if restoring from a backup.",
DataDir, DataDir, DataDir)));

There are two similar error messages in xlogrecovery.c. Maybe we can
modify the error messages to be similar.

--
Bowen Shi


On Thu, 21 Sept 2023 at 11:01, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jul 19, 2023 at 11:21:17AM -0700, David Zhang wrote:
> > 1) simply start server from a base backup
> >
> > FATAL:  could not find recovery.signal or standby.signal when recovering
> > with backup_label
> >
> > HINT:  If you are restoring from a backup, touch
> > "/media/david/disk1/pg_backup1/recovery.signal" or
> > "/media/david/disk1/pg_backup1/standby.signal" and add required recovery
> > options.
>
> Note the difference when --write-recovery-conf is specified, where a
> standby.conf is created with a primary_conninfo in
> postgresql.auto.conf.  So, yes, that's expected by default with the
> patch.
>
> > 2) touch a recovery.signal file and then try to start the server, the
> > following error was encountered:
> >
> > FATAL:  must specify restore_command when standby mode is not enabled
>
> Yes, that's also something expected in the scope of the v1 posted.
> The idea behind this restriction is that specifying recovery.signal is
> equivalent to asking for archive recovery, but not specifying
> restore_command is equivalent to not provide any options to be able to
> recover.  See validateRecoveryParameters() and note that this
> restriction exists since the beginning of times, introduced in commit
> 66ec2db.  I tend to agree that there is something to be said about
> self-contained backups taken from pg_basebackup, though, as these
> would fail if no restore_command is specified, and this restriction is
> in place before Postgres has introduced replication and easier ways to
> have base backups.  As a whole, I think that there is a good argument
> in favor of removing this restriction for the case where archive
> recovery is requested if users have all their WAL in pg_wal/ to be
> able to recover up to a consistent point, keeping these GUC
> restrictions if requesting a standby (not recovery.signal, only
> standby.signal).
>
> > 3) touch a standby.signal file, then the server successfully started,
> > however, it operates in standby mode, whereas the intended behavior was for
> > it to function as a primary server.
>
> standby.signal implies that the server will start in standby mode.  If
> one wants to deploy a new primary, that would imply a timeline jump at
> the end of recovery, you would need to specify recovery.signal
> instead.
>
> We need more discussions and more opinions, but the discussion has
> stalled for a few months now.  In case, I am adding Thomas Munro in CC
> who has mentioned to me at PGcon that he was interested in this patch
> (this thread's problem is not directly related to the fact that the
> checkpointer now runs in crash recovery, though).
>
> For now, I am attaching a rebased v2.
> --
> Michael



Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Michael Paquier
Date:
On Thu, Sep 21, 2023 at 11:45:06AM +0800, Bowen Shi wrote:
> First I encountered the problem " FATAL:  could not find
> recovery.signal or standby.signal when recovering with backup_label ",
> then I deleted the backup_label file and started the instance
> successfully.

Doing that is equal to corrupting your instance as recovery would
begin from the latest redo LSN stored in the control file, but the
physical relation files included in the backup may include blocks that
require records that are needed before this redo LSN and the LSN
stored in the backup_label.

>> Delete a backup_label from a fresh base backup can easily lead to data
>> corruption, as the startup process would pick up as LSN to start
>> recovery from the control file rather than the backup_label file.
>> This would happen if a checkpoint updates the redo LSN in the control
>> file while a backup happens and the control file is copied after the
>> checkpoint, for instance. If one wishes to deploy a new primary from
>> a base backup, recovery.signal is the way to go, making sure that the
>> new primary is bumped into a new timeline once recovery finishes, on
>> top of making sure that the startup process starts recovery from a
>> position where the cluster would be able to achieve a consistent
>> state.

And that's what I mean here.  In more details.  So, really, don't do
that.

> ereport(FATAL,
> (errmsg("could not find redo location referenced by checkpoint record"),
> errhint("If you are restoring from a backup, touch
> \"%s/recovery.signal\" and add required recovery options.\n"
> "If you are not restoring from a backup, try removing the file
> \"%s/backup_label\".\n"
> "Be careful: removing \"%s/backup_label\" will result in a corrupt
> cluster if restoring from a backup.",
> DataDir, DataDir, DataDir)));
>
> There are two similar error messages in xlogrecovery.c. Maybe we can
> modify the error messages to be similar.

The patch adds the following message, which is written this way to be
consistent with the two others, already:

+    ereport(FATAL,
+            (errmsg("could not find recovery.signal or standby.signal when recovering with backup_label"),
+             errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" or \"%s/standby.signal\" and
addrequired recovery options.", 
+                     DataDir, DataDir)));

But you have an interesting point here, why isn't standby.signal also
mentioned in the two existing comments?  Depending on what's wanted by
the user this can be equally useful to report back.

Attached is a slightly updated patch, where I have also removed the
check on ArchiveRecoveryRequested because the FATAL generated for
!ArchiveRecoveryRequested makes sure that it is useless after reading
the backup_label file.

This patch has been around for a few months now.  Do others have
opinions about the direction taken here to make the presence of
recovery.signal or standby.signal a hard requirement when a
backup_label file is found (HEAD only)?
--
Michael

Attachment

Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Kyotaro Horiguchi
Date:
At Fri, 10 Mar 2023 15:59:04 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> My apologies for the long message, but this deserves some attention,
> IMHO.
> 
> So, any thoughts?

Sorry for being late. However, I agree with David's concern regarding
the unnecessary inconvenience it introduces.  I'd like to maintain the
functionality.

While I agree that InArchiveRecovery should be activated only if
ArchiveReArchiveRecoveryRequested is true, I oppose to the notion that
the mere presence of backup_label should be interpreted as a request
for archive recovery (even if it is implied in a comment in
InitWalRecovery()). Instead, I propose that we separate backup_label
and archive recovery, in other words, we should not turn on
InArchiveRecovery if !ArchiveRecoveryRequested, regardless of the
presence of backup_label. We can know the minimum required recovery
LSN by the XLOG_BACKUP_END record.

The attached is a quick mock-up, but providing an approximation of my
thoughts. (For example, end_of_backup_reached could potentially be
extended to the ArchiveRecoveryRequested case and we could simplify
the condition..)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment

Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Kyotaro Horiguchi
Date:
Sorry, it seems that I posted at the wrong position..

At Thu, 28 Sep 2023 12:58:51 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Fri, 10 Mar 2023 15:59:04 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> > My apologies for the long message, but this deserves some attention,
> > IMHO.
> > 
> > So, any thoughts?
> 
> Sorry for being late. However, I agree with David's concern regarding
> the unnecessary inconvenience it introduces.  I'd like to maintain the
> functionality.
> 
> While I agree that InArchiveRecovery should be activated only if
> ArchiveReArchiveRecoveryRequested is true, I oppose to the notion that
> the mere presence of backup_label should be interpreted as a request
> for archive recovery (even if it is implied in a comment in
> InitWalRecovery()). Instead, I propose that we separate backup_label
> and archive recovery, in other words, we should not turn on
> InArchiveRecovery if !ArchiveRecoveryRequested, regardless of the
> presence of backup_label. We can know the minimum required recovery
> LSN by the XLOG_BACKUP_END record.
> 
> The attached is a quick mock-up, but providing an approximation of my
> thoughts. (For example, end_of_backup_reached could potentially be
> extended to the ArchiveRecoveryRequested case and we could simplify
> the condition..)

regards

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment

Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Michael Paquier
Date:
On Thu, Sep 28, 2023 at 12:58:51PM +0900, Kyotaro Horiguchi wrote:
> The attached is a quick mock-up, but providing an approximation of my
> thoughts. (For example, end_of_backup_reached could potentially be
> extended to the ArchiveRecoveryRequested case and we could simplify
> the condition..)

I am not sure why this is related to this thread..

 static XLogRecPtr backupStartPoint;
 static XLogRecPtr backupEndPoint;
 static bool backupEndRequired = false;
+static bool backupEndReached = false;

Anyway, sneaking at your suggestion, this is actually outlining the
main issue I have with this code currently.  We have so many static
booleans to control one behavior over the other that we always try to
make this code more complicated, while we should try to make it
simpler instead.
--
Michael

Attachment
On 9/27/23 23:58, Kyotaro Horiguchi wrote:
> At Fri, 10 Mar 2023 15:59:04 +0900, Michael Paquier <michael@paquier.xyz> wrote in
>> My apologies for the long message, but this deserves some attention,
>> IMHO.
>>
>> So, any thoughts?
> 
> Sorry for being late. However, I agree with David's concern regarding
> the unnecessary inconvenience it introduces.  I'd like to maintain the
> functionality.

After some playing around, I find I agree with Michael on this, i.e. 
require at least standby.signal when a backup_label is present.

According to my testing, you can preserve the "independent server" 
functionality by setting archive_command = /bin/false. In this case the 
timeline is not advanced and recovery proceeds from whatever is 
available in pg_wal.

I think this type of recovery from a backup label without a timeline 
change should absolutely be the exception, not the default as it seems 
to be now. If the server is truly independent, then the timeline change 
is not important. If the server is not independent, then the timeline 
change is critical.

So overall, +1 for Michael's patch, though I have only read through it 
and not tested it yet.

One comment, though, if we are going to require recovery.signal when 
backup_label is present, should it just be implied? Why error and force 
the user to create it?

Regards,
-David



Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Michael Paquier
Date:
On Thu, Sep 28, 2023 at 04:23:42PM -0400, David Steele wrote:
> After some playing around, I find I agree with Michael on this, i.e. require
> at least standby.signal when a backup_label is present.
>
> According to my testing, you can preserve the "independent server"
> functionality by setting archive_command = /bin/false. In this case the
> timeline is not advanced and recovery proceeds from whatever is available in
> pg_wal.

I've seen folks depend on such setups in the past, actually, letting a
process outside Postgres just "push" WAL segments to pg_wal instead of
Postgres pulling it with a restore_command or a primary_conninfo for a
standby.

> I think this type of recovery from a backup label without a timeline change
> should absolutely be the exception, not the default as it seems to be now.

This can mess up archives pretty easily, additionally, so it's not
something to encourage..

> If the server is truly independent, then the timeline change is not
> important. If the server is not independent, then the timeline change is
> critical.
>
> So overall, +1 for Michael's patch, though I have only read through it and
> not tested it yet.

Reviews, thoughts and opinions are welcome.

> One comment, though, if we are going to require recovery.signal when
> backup_label is present, should it just be implied? Why error and force the
> user to create it?

That's one thing I was considering, but I also cannot convince myself
that this is the best option because the presence of recovery.signal
or standby.standby (if both, standby.signal takes priority) makes it
clear what type of recovery is wanted at disk level.  I'd be OK if
folks think that this is a sensible consensus, as well, even if I
don't really agree with it.

Another idea I had was to force the creation of recovery.signal by
pg_basebackup even if -R is not used.  All the reports we've seen with
people getting confused came from pg_basebackup that enforces no
configuration.

A last thing, that had better be covered in a separate thread and
patch, is about validateRecoveryParameters().  These days, I'd like to
think that it may be OK to lift at least the restriction on
restore_command being required if we are doing recovery to ease the
case of self-contained backups (aka the case where all the WAL needed
to reach a consistent point is in pg_wal/ or its tarball)
--
Michael

Attachment
On 9/28/23 19:59, Michael Paquier wrote:
> On Thu, Sep 28, 2023 at 04:23:42PM -0400, David Steele wrote:
>>
>> So overall, +1 for Michael's patch, though I have only read through it and
>> not tested it yet.
> 
> Reviews, thoughts and opinions are welcome.

OK, I have now reviewed and tested the patch and it looks good to me. I 
stopped short of marking this RfC since there are other reviewers in the 
mix.

I dislike that we need to repeat:

OwnLatch(&XLogRecoveryCtl->recoveryWakeupLatch);

But I see the logic behind why you did it and there's no better way to 
do it as far as I can see.

>> One comment, though, if we are going to require recovery.signal when
>> backup_label is present, should it just be implied? Why error and force the
>> user to create it?
> 
> That's one thing I was considering, but I also cannot convince myself
> that this is the best option because the presence of recovery.signal
> or standby.standby (if both, standby.signal takes priority) makes it
> clear what type of recovery is wanted at disk level.  I'd be OK if
> folks think that this is a sensible consensus, as well, even if I
> don't really agree with it.

I'm OK with keeping it as required for now.

> Another idea I had was to force the creation of recovery.signal by
> pg_basebackup even if -R is not used.  All the reports we've seen with
> people getting confused came from pg_basebackup that enforces no
> configuration.

This change makes it more obvious if configuration is missing (since 
you'll get an error), however +1 for adding this to pg_basebackup.

> A last thing, that had better be covered in a separate thread and
> patch, is about validateRecoveryParameters().  These days, I'd like to
> think that it may be OK to lift at least the restriction on
> restore_command being required if we are doing recovery to ease the
> case of self-contained backups (aka the case where all the WAL needed
> to reach a consistent point is in pg_wal/ or its tarball)

Hmmm, I'm not sure about this. I'd prefer users set 
restore_command=/bin/false explicitly to fetch WAL from pg_wal by 
default if that's what they really intend.

Regards,
-David



Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Michael Paquier
Date:
On Sat, Oct 14, 2023 at 03:45:33PM -0400, David Steele wrote:
> On 9/28/23 19:59, Michael Paquier wrote:
> OK, I have now reviewed and tested the patch and it looks good to me. I
> stopped short of marking this RfC since there are other reviewers in the
> mix.

Thanks for the review.  Yes, I am wondering if other people would
chime in here.  It doesn't feel like this has gathered enough
opinions.  Now this thread has been around for many months, and we've
done quite a few changes in the backup APIs in the last few years with
few users complaining back about them..

> I dislike that we need to repeat:
>
> OwnLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
>
> But I see the logic behind why you did it and there's no better way to do it
> as far as I can see.

The main point is that there is no meaning in setting the latch until
the backup_label file is read because if ArchiveRecoveryRequested is
*not* set the startup process would outright fail as of the lack of
[recovery|standby].signal.

>> Another idea I had was to force the creation of recovery.signal by
>> pg_basebackup even if -R is not used.  All the reports we've seen with
>> people getting confused came from pg_basebackup that enforces no
>> configuration.
>
> This change makes it more obvious if configuration is missing (since you'll
> get an error), however +1 for adding this to pg_basebackup.

Looking at the streaming APIs of pg_basebackup, it looks like this
would be a matter of using bbstreamer_inject_file() to inject an empty
file into the stream.  Still something seems to be off once
compression methods are involved..  Hmm.  I am not sure.  Well, this
could always be done as a patch independant of this one, under a
separate discussion.  There are extra arguments about whether it would
be a good idea to add a recovery.signal even when taking a backup from
a standby, and do that only in 17~.

>> A last thing, that had better be covered in a separate thread and
>> patch, is about validateRecoveryParameters().  These days, I'd like to
>> think that it may be OK to lift at least the restriction on
>> restore_command being required if we are doing recovery to ease the
>> case of self-contained backups (aka the case where all the WAL needed
>> to reach a consistent point is in pg_wal/ or its tarball)
>
> Hmmm, I'm not sure about this. I'd prefer users set
> restore_command=/bin/false explicitly to fetch WAL from pg_wal by default if
> that's what they really intend.

It wouldn't be the first time we break compatibility in this area, so
perhaps you are right and keeping this requirement is fine, even if it
requires one extra step when recovering a self-contained backup
generated by pg_basebackup.  At least this forces users to look at
their setup and check if something is wrong.  We'd likely finish with
a few "bug" reports, as well :D
--
Michael

Attachment
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, passed

It looks good to me.

I have reviewed the code and tested the patch with basic check-world test an pgbench test (metioned in
https://www.postgresql.org/message-id/flat/ZQtzcH2lvo8leXEr%40paquier.xyz#cc5ed83e0edc0b9a1c1305f08ff7a335).
 

Another reviewer has also approved it, so I change the status to RFC.

The new status of this patch is: Ready for Committer

On Mon, 2023-10-16 at 14:54 +0900, Michael Paquier wrote:
> Thanks for the review.  Yes, I am wondering if other people would
> chime in here.  It doesn't feel like this has gathered enough
> opinions.

I don't have strong feelings either way.  If you have backup_label
but no signal file, starting PostgreSQL may succeed (if the WAL
with the checkpoint happens to be in pg_wal) or it may fail with
an error message.  There is no danger of causing damage unless you
remove backup_label, right?

I cannot think of a use case where you use such a configuration on
purpose, and the current error message is more crypric than a plain
"you must have a signal file to start from a backup", so perhaps
your patch is a good idea.

Yours,
Laurenz Albe



Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Michael Paquier
Date:
On Mon, Oct 16, 2023 at 05:48:43PM +0200, Laurenz Albe wrote:
> I don't have strong feelings either way.  If you have backup_label
> but no signal file, starting PostgreSQL may succeed (if the WAL
> with the checkpoint happens to be in pg_wal) or it may fail with
> an error message.  There is no danger of causing damage unless you
> remove backup_label, right?

A bit more happens currently if you have a backup_label with no signal
files, unfortunately, because this causes some startup states to not
be initialized.  See around here:
https://www.postgresql.org/message-id/Y/Q/17rpYS7YGbIt@paquier.xyz
https://www.postgresql.org/message-id/Y/v0c+3W89NBT/if@paquier.xyz

> I cannot think of a use case where you use such a configuration on
> purpose, and the current error message is more crypric than a plain
> "you must have a signal file to start from a backup", so perhaps
> your patch is a good idea.

I hope so.
--
Michael

Attachment

Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Michael Paquier
Date:
On Mon, Oct 16, 2023 at 02:54:35PM +0900, Michael Paquier wrote:
> On Sat, Oct 14, 2023 at 03:45:33PM -0400, David Steele wrote:
>> On 9/28/23 19:59, Michael Paquier wrote:
>>> Another idea I had was to force the creation of recovery.signal by
>>> pg_basebackup even if -R is not used.  All the reports we've seen with
>>> people getting confused came from pg_basebackup that enforces no
>>> configuration.
>>
>> This change makes it more obvious if configuration is missing (since you'll
>> get an error), however +1 for adding this to pg_basebackup.
>
> Looking at the streaming APIs of pg_basebackup, it looks like this
> would be a matter of using bbstreamer_inject_file() to inject an empty
> file into the stream.  Still something seems to be off once
> compression methods are involved..  Hmm.  I am not sure.  Well, this
> could always be done as a patch independant of this one, under a
> separate discussion.  There are extra arguments about whether it would
> be a good idea to add a recovery.signal even when taking a backup from
> a standby, and do that only in 17~.

Hmm.  On this specific point, it would actually be much simpler to
force recovery.signal to be in the contents streamed to a BASE_BACKUP.
This does not step on your proposal at [1], though, because you'd
still require a .signal file for recovery as far as I understand :/

[1]: https://www.postgresql.org/message-id/2daf8adc-8db7-4204-a7f2-a7e94e2bfa4b@pgmasters.net

Would folks be OK to move on with the patch of this thread at the end?
I am attempting a last-call kind of thing.
--
Michael

Attachment
On 10/27/23 03:22, Michael Paquier wrote:
> On Mon, Oct 16, 2023 at 02:54:35PM +0900, Michael Paquier wrote:
>> On Sat, Oct 14, 2023 at 03:45:33PM -0400, David Steele wrote:
>>> On 9/28/23 19:59, Michael Paquier wrote:
>>>> Another idea I had was to force the creation of recovery.signal by
>>>> pg_basebackup even if -R is not used.  All the reports we've seen with
>>>> people getting confused came from pg_basebackup that enforces no
>>>> configuration.
>>>
>>> This change makes it more obvious if configuration is missing (since you'll
>>> get an error), however +1 for adding this to pg_basebackup.
>>
>> Looking at the streaming APIs of pg_basebackup, it looks like this
>> would be a matter of using bbstreamer_inject_file() to inject an empty
>> file into the stream.  Still something seems to be off once
>> compression methods are involved..  Hmm.  I am not sure.  Well, this
>> could always be done as a patch independant of this one, under a
>> separate discussion.  There are extra arguments about whether it would
>> be a good idea to add a recovery.signal even when taking a backup from
>> a standby, and do that only in 17~.
> 
> Hmm.  On this specific point, it would actually be much simpler to
> force recovery.signal to be in the contents streamed to a BASE_BACKUP.

That sounds like the right plan to me. Nice and simple.

> This does not step on your proposal at [1], though, because you'd
> still require a .signal file for recovery as far as I understand :/
> 
> [1]: https://www.postgresql.org/message-id/2daf8adc-8db7-4204-a7f2-a7e94e2bfa4b@pgmasters.net

Yes.

> Would folks be OK to move on with the patch of this thread at the end?
> I am attempting a last-call kind of thing.

I'm still +1 for the patch as it stands.

Regards,
-David



Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Michael Paquier
Date:
On Fri, Oct 27, 2023 at 09:31:10AM -0400, David Steele wrote:
> That sounds like the right plan to me. Nice and simple.

I'll tackle that in a separate thread with a patch registered for the
upcoming CF of November.

> I'm still +1 for the patch as it stands.

I have been reviewing the patch, and applied portions of it as of
dc5bd388 and 1ffdc03c and they're quite independent pieces.  After
that, the remaining bits of the patch to change the behavior is now
straight-forward.  I have written a commit message for it, while on
it, as per the attached.
--
Michael

Attachment
On Mon, Oct 30, 2023 at 1:09 AM Michael Paquier <michael@paquier.xyz> wrote:

I have been reviewing the patch, and applied portions of it as of
dc5bd388 and 1ffdc03c and they're quite independent pieces.  After
that, the remaining bits of the patch to change the behavior is now
straight-forward.  I have written a commit message for it, while on
it, as per the attached.

A suggestion for the hint message in an effort to improve readability:

"If you are restoring from a backup, ensure \"%s/recovery.signal\" or \"%s/standby.signal\" is present and add required recovery options."

I realize the original use of "touch" is a valid shortcut for what I suggest above, however that will be less clear for the not-so-un*x-inclined users of Postgres, while for some it'll be downright confusing, IMHO. It also provides the advantage of being crystal clear on what needs to be done to fix the problem.

Roberto
On Mon, Oct 30, 2023 at 3:09 AM Michael Paquier <michael@paquier.xyz> wrote:
> I have been reviewing the patch, and applied portions of it as of
> dc5bd388 and 1ffdc03c and they're quite independent pieces.  After
> that, the remaining bits of the patch to change the behavior is now
> straight-forward.  I have written a commit message for it, while on
> it, as per the attached.

I would encourage some caution here.

In a vacuum, I'm in favor of this, and for the same reasons as you,
namely, that the huge pile of Booleans that we use to control recovery
is confusing, and it's difficult to make sure that all the code paths
are adequately tested, and I think some of the things that actually
work here are not documented.

But in practice, I think there is a possibility of something like this
backfiring very hard. Notice that the first two people who commented
on the thread saw the error and immediately removed backup_label even
though that's 100% wrong. It shows how utterly willing users are to
remove backup_label for any reason or no reason at all. If we convert
cases where things would have worked into cases where people nuke
backup_label and then it appears to work, we're going to be worse off
in the long run, no matter how crazy the idea of removing backup_label
may seem to us.

Also, Andres just recently mentioned to me that he uses this procedure
of starting a server with a backup_label but no recovery.signal or
standby.signal file regularly, and thinks other people do too. I was
surprised, since I've never done that, except maybe when I was a noob
and didn't have a clue. But Andres is far from a noob.

--
Robert Haas
EDB: http://www.enterprisedb.com



Hi,

On 2023-10-30 16:08:50 +0900, Michael Paquier wrote:
> From 26a8432fe3ab8426e7797d85d19b0fe69d3384c9 Mon Sep 17 00:00:00 2001
> From: Michael Paquier <michael@paquier.xyz>
> Date: Mon, 30 Oct 2023 16:02:52 +0900
> Subject: [PATCH v4] Require recovery.signal or standby.signal when reading a
>  backup_file
>
> Historically, the startup process uses two static variables to control
> if archive recovery should happen, when either recovery.signal or
> standby.signal are defined in the data folder at the beginning of
> recovery:

I think the problem with these variables is that they're a really messy state
machine - something this patch doesn't meaningfully improve IMO.


> This configuration was possible when recovering from a base backup taken
> by pg_basebackup without -R.  Note that the documentation requires at
> least to set recovery.signal to restore from a backup, but the startup
> process was not making this policy explicit.

Maybe I just didn't check the right place, but from I saw, this, at most, is
implied, rather than explicitly stated.


> In most cases, one would have been able to complete recovery, but that's a
> matter of luck, really, as it depends on the workload of the origin server.

With -X ... we have all the necessary WAL locally, how does the workload on
the primary matter? If you pass --no-slot, pg_basebackup might fail to fetch
the necessary wal, but then you'd also have gotten an error.


I agree with Robert that this would be a good error check on a green field,
but that I am less convinced it's going to help more than hurt now.


Right now running pg_basebackup with -X stream, without --write-recovery-conf,
gives you a copy of a cluster that will come up correctly as a distinct
instance.

With this change applied, you need to know that the way to avoid the existing
FATAL about restore_command at startup (when recovery.signal exists but
restore_command isn't set)) is to is to set "restore_command = false",
something we don't explain anywhere afaict. We should lessen the need to ever
use restore_command, not increase it.

It also seems risky to have people get used to restore_command = false,
because that effectively disables detection of other timelines etc. But, this
method does force a new timeline - which will be the same on each clone of the
database...

I also just don't think that it's always desirable to create a new timeline.

Greetings,

Andres Freund



Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Michael Paquier
Date:
On Mon, Oct 30, 2023 at 01:55:13PM -0400, Robert Haas wrote:
> I would encourage some caution here.

Thanks for chiming here.

> In a vacuum, I'm in favor of this, and for the same reasons as you,
> namely, that the huge pile of Booleans that we use to control recovery
> is confusing, and it's difficult to make sure that all the code paths
> are adequately tested, and I think some of the things that actually
> work here are not documented.

Yep, same feeling here.

> But in practice, I think there is a possibility of something like this
> backfiring very hard. Notice that the first two people who commented
> on the thread saw the error and immediately removed backup_label even
> though that's 100% wrong. It shows how utterly willing users are to
> remove backup_label for any reason or no reason at all. If we convert
> cases where things would have worked into cases where people nuke
> backup_label and then it appears to work, we're going to be worse off
> in the long run, no matter how crazy the idea of removing backup_label
> may seem to us.

As far as I know, there's one paragraph in the docs that implies this
mode without giving an actual hint that this may be OK or not, so
shrug:
https://www.postgresql.org/docs/devel/continuous-archiving.html#BACKUP-TIPS
"As with base backups, the easiest way to produce a standalone hot
backup is to use the pg_basebackup tool. If you include the -X
parameter when calling it, all the write-ahead log required to use the
backup will be included in the backup automatically, and no special
action is required to restore the backup."

And a few lines down we imply to use restore_command, something that
we check is set only if recovery.signal is set.  See additionally
validateRecoveryParameters(), where the comments imply that
InArchiveRecovery would be set only when there's a restore command.

As you're telling me, and I've considered that as an option as well,
perhaps we should just consider the presence of a backup_label file
with no .signal files as a synonym of crash recovery?  In the recovery
path, currently the essence of the problem is when we do
InArchiveRecovery=true, but ArchiveRecoveryRequested=false, meaning
that it should do archive recovery but we don't want it, and that does
not really make sense.  The rest of the code sort of implies that this
is not a suported combination.  So basically, my suggestion here, is
to just replay WAL up to the end of what's in your local pg_wal/ and
hope for the best, without TLI jumps, except that we'd do nothing.
Doing a pg_basebackup -X stream followed by a restart would work fine
with that, because all the WAL is here.

A point of contention is if we'd better be stricter about satisfying
backupEndPoint in such a case, but the redo code only wants to do
something here when ArchiveRecoveryRequested is set (aka there's a
.signal file set), and we would not want a TLI jump at the end of
recovery, so I don't see an argument with caring about backupEndPoint
in this case.

At the end, I'm OK as long as ArchiveRecoveryRequested=false
InArchiveRecovery=true does not exist anymore, because it's much
easier to get what's going on with the redo path, IMHO.

(I have a patch at hand to show the idea, will post it with a reply to
Andres' message.)

> Also, Andres just recently mentioned to me that he uses this procedure
> of starting a server with a backup_label but no recovery.signal or
> standby.signal file regularly, and thinks other people do too. I was
> surprised, since I've never done that, except maybe when I was a noob
> and didn't have a clue. But Andres is far from a noob.

At this stage, that's basically at your own risk, as the code thinks
it's OK to force what's basically archive-recovery-without-being-it.
So it basically works, but it can also easily backfire, as well..
--
Michael

Attachment

Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Michael Paquier
Date:
On Mon, Oct 30, 2023 at 10:32:28AM -0600, Roberto Mello wrote:
> I realize the original use of "touch" is a valid shortcut for what I
> suggest above, however that will be less clear for the not-so-un*x-inclined
> users of Postgres, while for some it'll be downright confusing, IMHO. It
> also provides the advantage of being crystal clear on what needs to be done
> to fix the problem.

Indeed, "touch" may be better in this path if we'd throw an ERROR to
enforce a given policy, and that's more consistent with the rest of
the area.
--
Michael

Attachment

Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Michael Paquier
Date:
On Mon, Oct 30, 2023 at 12:47:41PM -0700, Andres Freund wrote:
> I think the problem with these variables is that they're a really messy state
> machine - something this patch doesn't meaningfully improve IMO.

Okay.  Yes, this is my root issue as well.  We're at the stage where
we should reduce the possible set of combinations and assumptions
we're inventing because people can do undocumented stuff, then perhaps
refactor the code on top of that (say, if one combination with too
booleans is not possible, switch to a three-state enum rather than 2
bools, etc).

>> This configuration was possible when recovering from a base backup taken
>> by pg_basebackup without -R.  Note that the documentation requires at
>> least to set recovery.signal to restore from a backup, but the startup
>> process was not making this policy explicit.
>
> Maybe I just didn't check the right place, but from I saw, this, at most, is
> implied, rather than explicitly stated.

See the doc reference here:
https://www.postgresql.org/message-id/ZUBM6BNQnEh7lzIM@paquier.xyz

So it kind of implies it, still also mentions restore_command.  It's
like Schrödinger's cat, yes and no at the same time.

> With -X ... we have all the necessary WAL locally, how does the workload on
> the primary matter? If you pass --no-slot, pg_basebackup might fail to fetch
> the necessary wal, but then you'd also have gotten an error.
>
> [...]
>
> Right now running pg_basebackup with -X stream, without --write-recovery-conf,
> gives you a copy of a cluster that will come up correctly as a distinct
> instance.
>
> [...]
>
> I also just don't think that it's always desirable to create a new timeline.

Yeah.  Another argument I was mentioning to Robert is that we may want
to just treat the case where you have a backup_label without any
signal files just the same as crash recovery, replaying all the local
pg_wal/, and nothing else.  For example, something like the attached
should make sure that InArchiveRecovery=true should never be set if
ArchiveRecoveryRequested is not set.

The attached would still cause redo to complain on a "WAL ends before
end of online backup" if not all the WAL is here (reason behind the
tweak of 010_pg_basebackup.pl, but the previous tweak to pg_rewind's
008_min_recovery_point.pl is not required here).

Attached is the idea I had in mind, in terms of code, FWIW.
--
Michael

Attachment
On Mon, Oct 30, 2023 at 8:40 PM Michael Paquier <michael@paquier.xyz> wrote:
> As far as I know, there's one paragraph in the docs that implies this
> mode without giving an actual hint that this may be OK or not, so
> shrug:
> https://www.postgresql.org/docs/devel/continuous-archiving.html#BACKUP-TIPS
> "As with base backups, the easiest way to produce a standalone hot
> backup is to use the pg_basebackup tool. If you include the -X
> parameter when calling it, all the write-ahead log required to use the
> backup will be included in the backup automatically, and no special
> action is required to restore the backup."

I see your point, but that's way too subtle. As far as I know, the
only actually-documented procedure for restoring is this one:

https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-PITR-RECOVERY

That procedure actually is badly in need of some updating, IMHO,
because close to half of it is about moving your existing database
cluster out of the way, which may or may not be needed in the case of
any particular backup restore. Also, it unconditionally mentions
creating recovery.signal, with no mention of standby.signal. And
certainly not with neither. It also gives zero motivation for actually
doing this and says nothing useful about backup_label.

Both recovery.signal and standby.signal are documented in
https://www.postgresql.org/docs/current/runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVE-RECOVERY
but you'd have no real reason to look in a list of GUCs for
information about a file on disk. recovery.signal but not
standby.signal is mentioned in
https://www.postgresql.org/docs/current/warm-standby.html but nowhere
that I can find do we explicitly talk about running with at least one
of them.

> As you're telling me, and I've considered that as an option as well,
> perhaps we should just consider the presence of a backup_label file
> with no .signal files as a synonym of crash recovery?  In the recovery
> path, currently the essence of the problem is when we do
> InArchiveRecovery=true, but ArchiveRecoveryRequested=false, meaning
> that it should do archive recovery but we don't want it, and that does
> not really make sense.  The rest of the code sort of implies that this
> is not a suported combination.  So basically, my suggestion here, is
> to just replay WAL up to the end of what's in your local pg_wal/ and
> hope for the best, without TLI jumps, except that we'd do nothing.

This sentence seems to be incomplete.

But I was not saying we should treat the case where we have a
backup_label file like crash recovery. The real question here is why
we don't treat it fully like archive recovery. I don't know off-hand
what is different if I start the server with both backup_label and
recovery.signal vs. if I start it with only backup_label, but I
question whether there should be any difference at all.

> A point of contention is if we'd better be stricter about satisfying
> backupEndPoint in such a case, but the redo code only wants to do
> something here when ArchiveRecoveryRequested is set (aka there's a
> .signal file set), and we would not want a TLI jump at the end of
> recovery, so I don't see an argument with caring about backupEndPoint
> in this case.

This is a bit hard for me to understand, but I disagree strongly with
the idea that we should ever ignore a backup end point if we have one.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Michael Paquier
Date:
On Tue, Oct 31, 2023 at 08:28:07AM -0400, Robert Haas wrote:
> On Mon, Oct 30, 2023 at 8:40 PM Michael Paquier <michael@paquier.xyz> wrote:
>> As far as I know, there's one paragraph in the docs that implies this
>> mode without giving an actual hint that this may be OK or not, so
>> shrug:
>> https://www.postgresql.org/docs/devel/continuous-archiving.html#BACKUP-TIPS
>> "As with base backups, the easiest way to produce a standalone hot
>> backup is to use the pg_basebackup tool. If you include the -X
>> parameter when calling it, all the write-ahead log required to use the
>> backup will be included in the backup automatically, and no special
>> action is required to restore the backup."
>
> I see your point, but that's way too subtle. As far as I know, the
> only actually-documented procedure for restoring is this one:
> https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-PITR-RECOVERY
>
> That procedure actually is badly in need of some updating, IMHO,
> because close to half of it is about moving your existing database
> cluster out of the way, which may or may not be needed in the case of
> any particular backup restore. Also, it unconditionally mentions
> creating recovery.signal, with no mention of standby.signal. And
> certainly not with neither. It also gives zero motivation for actually
> doing this and says nothing useful about backup_label.
>
> Both recovery.signal and standby.signal are documented in
> https://www.postgresql.org/docs/current/runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVE-RECOVERY
> but you'd have no real reason to look in a list of GUCs for
> information about a file on disk. recovery.signal but not
> standby.signal is mentioned in
> https://www.postgresql.org/docs/current/warm-standby.html but nowhere
> that I can find do we explicitly talk about running with at least one
> of them.

Point 7. of what you quote says to use one?  True that this needs a
refresh, and perhaps a bit fat warning about the fact that these are
required if you want to fetch WAL from other sources than the local
pg_wal/.  Perhaps there may be a point of revisiting the default
behavior of recovery_target_timeline in this case, I don't know.

>> As you're telling me, and I've considered that as an option as well,
>> perhaps we should just consider the presence of a backup_label file
>> with no .signal files as a synonym of crash recovery?  In the recovery
>> path, currently the essence of the problem is when we do
>> InArchiveRecovery=true, but ArchiveRecoveryRequested=false, meaning
>> that it should do archive recovery but we don't want it, and that does
>> not really make sense.  The rest of the code sort of implies that this
>> is not a suported combination.  So basically, my suggestion here, is
>> to just replay WAL up to the end of what's in your local pg_wal/ and
>> hope for the best, without TLI jumps, except that we'd do nothing.
>
> This sentence seems to be incomplete.

I've re-read it, and it looks OK to me.  What I mean with this
paragraph are two things:
- Remove InArchiveRecovery=true and ArchiveRecoveryRequested=false as
a possible combination in the code.
- Treat backup_label with no .signal file as the same as crash
recovery, that:
-- Does no TLI jump at the end of recovery.
-- Expects all the WAL to be in pg_wal/.

> But I was not saying we should treat the case where we have a
> backup_label file like crash recovery. The real question here is why
> we don't treat it fully like archive recovery.

Timeline jump at the end of recovery?  Archive recovery forces a TLI
jump by default at the end of redo if there's a signal file, and some
users may not want a TLI jump by default?

> I don't know off-hand
> what is different if I start the server with both backup_label and
> recovery.signal vs. if I start it with only backup_label, but I
> question whether there should be any difference at all.

Perhaps we could do that, but note that backup_label is renamed to
backup_label.old at the beginning of redo.  The code has historically
always enforced InArchiveRecovery=true when there's a backup label,
and InArchiveRecovery=false where there is no backup label, so we
don't get the same recovery behavior if a cluster is restarted while
it was still performing recovery.  I don't quite see how it is
possible to make this code simpler without enforcing a policy to take
care of this inconsistency.  I've listed two of them on this thread:
- Force the presence of a .recovery file when there is a
backup_label, to force archive recovery.
- Force crash recovery if there are no signal files but a
backup_label, then a restart of a cluster that began a restore while
it processed a backup would be confused: should it do crash recovery
or archive recovery?

My guess, based on what I read from the feedback of this thread, is
that it could be more helpful to do the second thing, not the third
one, because this is better with standalone backups: no TLI jumps and
restore happens with all the local WAL in pg_wal/, without any GUCs to
control how recovery should run.

You are suggesting a third, hybrid, approach.  Now note we have always
checked for signal files before the backup_label.  Recovery GUCs are
checked only if there's one of the two signal files.  It seems to me
that what you are suggesting would make the code a bit harder to
follow, actually, and more inconsistent with stable branches because
we would need to check the control file contents *before* checking for
the .signal files or backup_label to be able to see if archive
recovery *should* happen, depending on if there's a backupEndPoint.

>> A point of contention is if we'd better be stricter about satisfying
>> backupEndPoint in such a case, but the redo code only wants to do
>> something here when ArchiveRecoveryRequested is set (aka there's a
>> .signal file set), and we would not want a TLI jump at the end of
>> recovery, so I don't see an argument with caring about backupEndPoint
>> in this case.
>
> This is a bit hard for me to understand, but I disagree strongly with
> the idea that we should ever ignore a backup end point if we have one.

Actually, while experimenting yesterday before sending my reply to
you, I have noticed that redo cares about backupEndPoint even if you
force crash recovery when there's only a backup_label file.  There's a
case in pg_basebackup that would fail, but that's accidental, AFAIK:
https://www.postgresql.org/message-id/ZUBVKfL6FR6NOQyt%40paquier.xyz

See in StartupXLOG(), around the comment "complain if we did not roll
forward far enough to reach".  This complains if archive recovery has
been requested *or* if we retrieved a backup end LSN from the
backup_label.
--
Michael

Attachment

Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Kyotaro Horiguchi
Date:
At Wed, 1 Nov 2023 08:39:17 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> See in StartupXLOG(), around the comment "complain if we did not roll
> forward far enough to reach".  This complains if archive recovery has
> been requested *or* if we retrieved a backup end LSN from the
> backup_label.

Please note that backupStartPoint is not reset even when reaching the
backup end point during crash recovery. If backup_label enforces
archive recovery, I think this point won't be an issue as you
mentioned. For the record, my earlier proposal aimed to detect
reaching the end point even during crash recovery.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Michael Paquier
Date:
On Thu, Nov 02, 2023 at 11:03:35AM +0900, Kyotaro Horiguchi wrote:
> At Wed, 1 Nov 2023 08:39:17 +0900, Michael Paquier <michael@paquier.xyz> wrote in
>> See in StartupXLOG(), around the comment "complain if we did not roll
>> forward far enough to reach".  This complains if archive recovery has
>> been requested *or* if we retrieved a backup end LSN from the
>> backup_label.
>
> Please note that backupStartPoint is not reset even when reaching the
> backup end point during crash recovery. If backup_label enforces
> archive recovery, I think this point won't be an issue as you
> mentioned. For the record, my earlier proposal aimed to detect
> reaching the end point even during crash recovery.

Good point.  Not doing ReachedEndOfBackup() at the end of crash
recovery feels inconsistent, especially since we care about some of
these fields in this case.

If a .signal file is required when we read a backup_label, yes that
would not be a problem because we'd always link backupEndPoint's
destiny with a requested archive recovery, but there seem to be little
love for enforcing that based on the feedback of this thread, so..
--
Michael

Attachment
On Tue, Oct 31, 2023 at 7:39 PM Michael Paquier <michael@paquier.xyz> wrote:
> Point 7. of what you quote says to use one?  True that this needs a
> refresh, and perhaps a bit fat warning about the fact that these are
> required if you want to fetch WAL from other sources than the local
> pg_wal/.  Perhaps there may be a point of revisiting the default
> behavior of recovery_target_timeline in this case, I don't know.

I don't really know what to say to this -- sure, point 7 of
"Recovering Using a Continuous Archive Backup" says to use
recovery.signal. But as I said in the preceding paragraph, it doesn't
say either "use recovery.signal or standby.signal". Nor does it or
anything else in the documentation explain under what circumstances
you're allowed to have neither. So the whole thing is very unclear.

> >> As you're telling me, and I've considered that as an option as well,
> >> perhaps we should just consider the presence of a backup_label file
> >> with no .signal files as a synonym of crash recovery?  In the recovery
> >> path, currently the essence of the problem is when we do
> >> InArchiveRecovery=true, but ArchiveRecoveryRequested=false, meaning
> >> that it should do archive recovery but we don't want it, and that does
> >> not really make sense.  The rest of the code sort of implies that this
> >> is not a suported combination.  So basically, my suggestion here, is
> >> to just replay WAL up to the end of what's in your local pg_wal/ and
> >> hope for the best, without TLI jumps, except that we'd do nothing.
> >
> > This sentence seems to be incomplete.
>
> I've re-read it, and it looks OK to me.

Well, the sentence ends with "except that we'd do nothing" and I don't
know what that means. It would make sense to me if it said "except
that we'd do nothing about <whatever>" or "except that we'd do nothing
instead of <something>" but as you've written it basically seems to
boil down to "my suggestion is to replay WAL except do nothing" which
makes no sense. If you replay WAL, you're not doing nothing.

> > But I was not saying we should treat the case where we have a
> > backup_label file like crash recovery. The real question here is why
> > we don't treat it fully like archive recovery.
>
> Timeline jump at the end of recovery?  Archive recovery forces a TLI
> jump by default at the end of redo if there's a signal file, and some
> users may not want a TLI jump by default?

Uggh. I don't know what to think about that. I bet some people do want
that, but that makes it pretty easy to end up with multiple copies of
the same cluster running on the same TLI, too, which is not a thing
that you really want to have happen.

At the end of the day, I'm coming around to the view that the biggest
problem here is the documentation. Nobody can really know what's
supposed to work right now because the documentation doesn't say which
things you are and are not allowed to do and what results you should
expect in each case. If it did, it would be easier to discuss possible
behavior changes. Right now, it's hard to change any code at all,
because there's no list of supported scenarios, so you can't tell
whether a potential change affects a scenario that somebody thinks
should work, or only cases that nobody can possibly care about. It's
sort of possible to reason your way through that, to an extent, but
it's pretty hard. The fact that I didn't know that starting from a
backup with neither recovery.signal nor standby.signal was a thing
that anybody did or cared about is good evidence of that.

I'm coming to the understanding that we have four supported scenarios.
One, no backup_label, no recovery.signal, and no standby.signal.
Hence, replay WAL until the end, then start up. Two, backup_label
exists but neither recovery.signal nor standby.signal does. As before,
but if I understand correctly, now we can check that we reached the
backup end location. Three, recovery.signal exists, with or without
backup_label. Now we create a new TLI at the end of recovery, and
also, now can fetch WAL that is not present in pg_wal using
primary_conninfo or restore_command. In fact, I think we may prefer to
do that over using WAL we have locally, but I'm not quite sure about
that. Fourth, standby.signal exists, with or without backup_label. As
the previous scenario, but now when we reach the end of WAL we wait
for more to appear instead of ending recovery. I have a feeling this
is not quite an exhaustive list of differences between the various
modes, and I'm not even sure that it lists all of the things someone
might try to do. Thoughts?

I also feel like the terminology here sometimes obscures more than it
illuminates. For instance, it seems like ArchiveRecoveryRequested
really means "are any signal files present?" while InArchiveRecovery
means "are we fetching WAL from outside pg_wal rather than using
what's in pg_wal?". But these are not obvious from the names, and
sometimes we have additional variables with overlapping meanings, like
readSource, which indicates whether we're reading from pg_wal, the
archive, or the walreceiver, and yet is probably not redundant with
InArchiveRecovery. In any event, I think that we need to start with
the question of what behavior(s) we want to expose to users, and then
back into the question of what internal variables and states need to
exist in order to support that behavior. We cannot start by deciding
what variables we'd like to get rid of and then trying to justify the
resulting behavior changes on the grounds that they simplify the code.
Users aren't going to like that, hackers aren't going to like that,
and the resulting behavior probably won't be anything great.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Michael Paquier
Date:
On Wed, Nov 08, 2023 at 01:16:58PM -0500, Robert Haas wrote:
> On Tue, Oct 31, 2023 at 7:39 PM Michael Paquier <michael@paquier.xyz> wrote:
>>>> As you're telling me, and I've considered that as an option as well,
>>>> perhaps we should just consider the presence of a backup_label file
>>>> with no .signal files as a synonym of crash recovery?  In the recovery
>>>> path, currently the essence of the problem is when we do
>>>> InArchiveRecovery=true, but ArchiveRecoveryRequested=false, meaning
>>>> that it should do archive recovery but we don't want it, and that does
>>>> not really make sense.  The rest of the code sort of implies that this
>>>> is not a suported combination.  So basically, my suggestion here, is
>>>> to just replay WAL up to the end of what's in your local pg_wal/ and
>>>> hope for the best, without TLI jumps, except that we'd do nothing.
>>>
>>> This sentence seems to be incomplete.
>>
>> I've re-read it, and it looks OK to me.
>
> Well, the sentence ends with "except that we'd do nothing" and I don't
> know what that means. It would make sense to me if it said "except
> that we'd do nothing about <whatever>" or "except that we'd do nothing
> instead of <something>" but as you've written it basically seems to
> boil down to "my suggestion is to replay WAL except do nothing" which
> makes no sense. If you replay WAL, you're not doing nothing.

Sure, sorry for the confusion.  By "we'd do nothing", I mean precirely
"to take no specific action related to archive recovery and recovery
parameters at the end of recovery", meaning that a combination of
backup_label with no signal file would be the same as crash recovery,
replaying WAL up to the end of what can be found in pg_wal/, and only
that.

>>> But I was not saying we should treat the case where we have a
>>> backup_label file like crash recovery. The real question here is why
>>> we don't treat it fully like archive recovery.
>>
>> Timeline jump at the end of recovery?  Archive recovery forces a TLI
>> jump by default at the end of redo if there's a signal file, and some
>> users may not want a TLI jump by default?
>
> Uggh. I don't know what to think about that. I bet some people do want
> that, but that makes it pretty easy to end up with multiple copies of
> the same cluster running on the same TLI, too, which is not a thing
> that you really want to have happen.

Andres has mentioned upthread that this is something he's been using
to quickly be able to clone a cluster.  I would not recommend doing
that, personally, but if that's useful in some cases, well, why not.

> At the end of the day, I'm coming around to the view that the biggest
> problem here is the documentation. Nobody can really know what's
> supposed to work right now because the documentation doesn't say which
> things you are and are not allowed to do and what results you should
> expect in each case. If it did, it would be easier to discuss possible
> behavior changes. Right now, it's hard to change any code at all,
> because there's no list of supported scenarios, so you can't tell
> whether a potential change affects a scenario that somebody thinks
> should work, or only cases that nobody can possibly care about. It's
> sort of possible to reason your way through that, to an extent, but
> it's pretty hard. The fact that I didn't know that starting from a
> backup with neither recovery.signal nor standby.signal was a thing
> that anybody did or cared about is good evidence of that.

That's one problem, not all of it, because the code takes extra
assumptions around that.

> I also feel like the terminology here sometimes obscures more than it
> illuminates. For instance, it seems like ArchiveRecoveryRequested
> really means "are any signal files present?" while InArchiveRecovery
> means "are we fetching WAL from outside pg_wal rather than using
> what's in pg_wal?". But these are not obvious from the names, and
> sometimes we have additional variables with overlapping meanings, like
> readSource, which indicates whether we're reading from pg_wal, the
> archive, or the walreceiver, and yet is probably not redundant with
> InArchiveRecovery. In any event, I think that we need to start with
> the question of what behavior(s) we want to expose to users, and then
> back into the question of what internal variables and states need to
> exist in order to support that behavior. We cannot start by deciding
> what variables we'd like to get rid of and then trying to justify the
> resulting behavior changes on the grounds that they simplify the code.
> Users aren't going to like that, hackers aren't going to like that,
> and the resulting behavior probably won't be anything great.

Note as well that InArchiveRecovery is set when there's a
backup_label, but that the code would check for the existence of a
restore_command only if a signal file exists.  That's strange, but if
people have been relying on this behavior, so be it.

At this stage, it looks pretty clear to me that there's no consensus
on what to do, and nobody's happy with the proposal of this thread, so
I am going to mark it as rejected.
--
Michael

Attachment

Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Michael Paquier
Date:
On Thu, Nov 09, 2023 at 12:04:19PM +0900, Michael Paquier wrote:
> Sure, sorry for the confusion.  By "we'd do nothing", I mean precirely
> "to take no specific action related to archive recovery and recovery
> parameters at the end of recovery", meaning that a combination of
> backup_label with no signal file would be the same as crash recovery,
> replaying WAL up to the end of what can be found in pg_wal/, and only
> that.

By being slightly more precise.  I also mean to fail recovery if it is
not possible to replay up to the end-of-backup LSN marked in the label
file because we are missing some stuff in pg_wal/, which is something
that the code is currently able to handle.
--
Michael

Attachment
Hi,

On 2023-11-09 12:16:52 +0900, Michael Paquier wrote:
> On Thu, Nov 09, 2023 at 12:04:19PM +0900, Michael Paquier wrote:
> > Sure, sorry for the confusion.  By "we'd do nothing", I mean precirely
> > "to take no specific action related to archive recovery and recovery
> > parameters at the end of recovery", meaning that a combination of
> > backup_label with no signal file would be the same as crash recovery,
> > replaying WAL up to the end of what can be found in pg_wal/, and only
> > that.

I don't think those are equivalent - in the "backup_label with no signal file"
case we start recovery at a different location than the "crash recovery" case
does.


> By being slightly more precise.  I also mean to fail recovery if it is
> not possible to replay up to the end-of-backup LSN marked in the label
> file because we are missing some stuff in pg_wal/, which is something
> that the code is currently able to handle.

"able to handle" as in detect and error out? Because that's the only possible
sane thing to do, correct?

Greetings,

Andres Freund



Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

From
Michael Paquier
Date:
On Mon, Nov 13, 2023 at 03:41:44PM -0800, Andres Freund wrote:
> On 2023-11-09 12:16:52 +0900, Michael Paquier wrote:
>> On Thu, Nov 09, 2023 at 12:04:19PM +0900, Michael Paquier wrote:
>> > Sure, sorry for the confusion.  By "we'd do nothing", I mean precirely
>> > "to take no specific action related to archive recovery and recovery
>> > parameters at the end of recovery", meaning that a combination of
>> > backup_label with no signal file would be the same as crash recovery,
>> > replaying WAL up to the end of what can be found in pg_wal/, and only
>> > that.
>
> I don't think those are equivalent - in the "backup_label with no signal file"
> case we start recovery at a different location than the "crash recovery" case
> does.

It depends on how you see things, and based on my read of the thread
or the code we've never really put a clear definition what a
"backup_label with no signal file" should do.  The definition I was
suggesting is to make it work the same way as crash recovery
internally:
- use the start LSN from the backup_label.
- replay up to the end of local WAL.
- don't rely on any recovery GUCs.
- if at the end of recovery replay has not reached the end-of-backup
record, then fail.

>> By being slightly more precise.  I also mean to fail recovery if it is
>> not possible to replay up to the end-of-backup LSN marked in the label
>> file because we are missing some stuff in pg_wal/, which is something
>> that the code is currently able to handle.
>
> "able to handle" as in detect and error out? Because that's the only possible
> sane thing to do, correct?

By "able to handle", I mean to detect that the expected LSN has not
been reached and FATAL, or fail recovery.  So yes.
--
Michael

Attachment
Hi,

On 2023-11-14 09:13:44 +0900, Michael Paquier wrote:
> On Mon, Nov 13, 2023 at 03:41:44PM -0800, Andres Freund wrote:
> > On 2023-11-09 12:16:52 +0900, Michael Paquier wrote:
> >> On Thu, Nov 09, 2023 at 12:04:19PM +0900, Michael Paquier wrote:
> >> > Sure, sorry for the confusion.  By "we'd do nothing", I mean precirely
> >> > "to take no specific action related to archive recovery and recovery
> >> > parameters at the end of recovery", meaning that a combination of
> >> > backup_label with no signal file would be the same as crash recovery,
> >> > replaying WAL up to the end of what can be found in pg_wal/, and only
> >> > that.
> > 
> > I don't think those are equivalent - in the "backup_label with no signal file"
> > case we start recovery at a different location than the "crash recovery" case
> > does.
> 
> It depends on how you see things, and based on my read of the thread
> or the code we've never really put a clear definition what a
> "backup_label with no signal file" should do.  The definition I was
> suggesting is to make it work the same way as crash recovery
> internally:
> - use the start LSN from the backup_label.

That's fundamentally different from crash recovery!

> - replay up to the end of local WAL.
> - don't rely on any recovery GUCs.
> - if at the end of recovery replay has not reached the end-of-backup
> record, then fail.

Also different from crash recovery.

It doesn't make sense to me to say "work the same way" when there are such
fundamental differences.

Greetings,

Andres Freund