Thread: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Bharath Rupireddy
Date:
Hi,

In standby mode, the state machine in WaitForWALToBecomeAvailable()
reads WAL from pg_wal after failing to read from the archive. This is
currently implemented in XLogFileReadAnyTLI() by calling
XLogFileRead() with source XLOG_FROM_PG_WAL after it fails with source
XLOG_FROM_PG_ARCHIVE and the current source isn't changed at all.
Also, passing the source to XLogFileReadAnyTLI() in
WaitForWALToBecomeAvailable() isn't straight i.e. it's not necessary
to pass in XLOG_FROM_ANY at all. These things make the state machine a
bit complicated and hard to understand.

The attached patch attempts to simplify the code a bit by changing the
current source to XLOG_FROM_PG_WAL after failing in
XLOG_FROM_PG_ARCHIVE so that the state machine can move smoothly to
read from pg_wal. And we can just pass the current source to
XLogFileReadAnyTLI(). It also enables us to reduce a bit of extra
XLogFileRead() code in XLogFileReadAnyTLI().

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Amul Sul
Date:
On Tue, Oct 18, 2022 at 12:01 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> In standby mode, the state machine in WaitForWALToBecomeAvailable()
> reads WAL from pg_wal after failing to read from the archive. This is
> currently implemented in XLogFileReadAnyTLI() by calling
> XLogFileRead() with source XLOG_FROM_PG_WAL after it fails with source
> XLOG_FROM_PG_ARCHIVE and the current source isn't changed at all.
> Also, passing the source to XLogFileReadAnyTLI() in
> WaitForWALToBecomeAvailable() isn't straight i.e. it's not necessary
> to pass in XLOG_FROM_ANY at all. These things make the state machine a
> bit complicated and hard to understand.
>
> The attached patch attempts to simplify the code a bit by changing the
> current source to XLOG_FROM_PG_WAL after failing in
> XLOG_FROM_PG_ARCHIVE so that the state machine can move smoothly to
> read from pg_wal. And we can just pass the current source to
> XLogFileReadAnyTLI(). It also enables us to reduce a bit of extra
> XLogFileRead() code in XLogFileReadAnyTLI().
>
> Thoughts?

+1

Regards,
Amul



Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Bharath Rupireddy
Date:
On Tue, Oct 18, 2022 at 1:03 PM Amul Sul <sulamul@gmail.com> wrote:
>
> On Tue, Oct 18, 2022 at 12:01 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Hi,
> >
> > In standby mode, the state machine in WaitForWALToBecomeAvailable()
> > reads WAL from pg_wal after failing to read from the archive. This is
> > currently implemented in XLogFileReadAnyTLI() by calling
> > XLogFileRead() with source XLOG_FROM_PG_WAL after it fails with source
> > XLOG_FROM_PG_ARCHIVE and the current source isn't changed at all.
> > Also, passing the source to XLogFileReadAnyTLI() in
> > WaitForWALToBecomeAvailable() isn't straight i.e. it's not necessary
> > to pass in XLOG_FROM_ANY at all. These things make the state machine a
> > bit complicated and hard to understand.
> >
> > The attached patch attempts to simplify the code a bit by changing the
> > current source to XLOG_FROM_PG_WAL after failing in
> > XLOG_FROM_PG_ARCHIVE so that the state machine can move smoothly to
> > read from pg_wal. And we can just pass the current source to
> > XLogFileReadAnyTLI(). It also enables us to reduce a bit of extra
> > XLogFileRead() code in XLogFileReadAnyTLI().
> >
> > Thoughts?
>
> +1

Thanks. Let's see what others think about it. I will add a CF entry in a while.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Nathan Bossart
Date:
On Tue, Oct 18, 2022 at 12:01:07PM +0530, Bharath Rupireddy wrote:
> The attached patch attempts to simplify the code a bit by changing the
> current source to XLOG_FROM_PG_WAL after failing in
> XLOG_FROM_PG_ARCHIVE so that the state machine can move smoothly to
> read from pg_wal. And we can just pass the current source to
> XLogFileReadAnyTLI(). It also enables us to reduce a bit of extra
> XLogFileRead() code in XLogFileReadAnyTLI().

This looks correct to me.  The only thing that stood out to me was the loop
through 'tles' in XLogFileReadyAnyTLI.  With this change, we'd loop through
the timelines for both XLOG_FROM_PG_ARCHIVE and XLOG_FROM_PG_WAL, whereas
now we only loop through the timelines once.  However, I doubt this makes
much difference in practice.  You'd only do the extra loop whenever
restoring from the archives failed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Michael Paquier
Date:
On Fri, Dec 30, 2022 at 10:32:57AM -0800, Nathan Bossart wrote:
> This looks correct to me.  The only thing that stood out to me was the loop
> through 'tles' in XLogFileReadyAnyTLI.  With this change, we'd loop through
> the timelines for both XLOG_FROM_PG_ARCHIVE and XLOG_FROM_PG_WAL, whereas
> now we only loop through the timelines once.  However, I doubt this makes
> much difference in practice.  You'd only do the extra loop whenever
> restoring from the archives failed.

        case XLOG_FROM_ARCHIVE:
+
+           /*
+            * After failing to read from archive, we try to read from
+            * pg_wal.
+            */
+           currentSource = XLOG_FROM_PG_WAL;
+           break;
In standby mode, the priority lookup order is pg_wal -> archive ->
stream.  With this change, we would do pg_wal -> archive -> pg_wal ->
stream, meaning that it could influence some recovery scenarios while
involving more lookups than necessary to the local pg_wal/ directory?

See, on failure where the current source is XLOG_FROM_ARCHIVE, we
would not switch anymore directly to XLOG_FROM_STREAM.
--
Michael

Attachment

Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Bharath Rupireddy
Date:
On Tue, Jan 3, 2023 at 7:47 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Dec 30, 2022 at 10:32:57AM -0800, Nathan Bossart wrote:
> > This looks correct to me.  The only thing that stood out to me was the loop
> > through 'tles' in XLogFileReadyAnyTLI.  With this change, we'd loop through
> > the timelines for both XLOG_FROM_PG_ARCHIVE and XLOG_FROM_PG_WAL, whereas
> > now we only loop through the timelines once.  However, I doubt this makes
> > much difference in practice.  You'd only do the extra loop whenever
> > restoring from the archives failed.
>
>         case XLOG_FROM_ARCHIVE:
> +
> +           /*
> +            * After failing to read from archive, we try to read from
> +            * pg_wal.
> +            */
> +           currentSource = XLOG_FROM_PG_WAL;
> +           break;
> In standby mode, the priority lookup order is pg_wal -> archive ->
> stream.  With this change, we would do pg_wal -> archive -> pg_wal ->
> stream, meaning that it could influence some recovery scenarios while
> involving more lookups than necessary to the local pg_wal/ directory?
>
> See, on failure where the current source is XLOG_FROM_ARCHIVE, we
> would not switch anymore directly to XLOG_FROM_STREAM.

I think there's a bit of disconnect here - here's what I understand:

Standby when started can either enter to crash recovery (if it is a
restart after crash) or enter to archive recovery directly.

The standby, when in crash recovery:
currentSource is set to XLOG_FROM_PG_WAL in
WaitForWALToBecomeAvailable() and it continues to exhaust replaying
all the WAL in the pg_wal directory.
After all the pg_wal is exhausted during crash recovery, currentSource
is set to XLOG_FROM_ANY in ReadRecord() and the standby enters archive
recovery mode (see below).

The standby, when in archive recovery:
In WaitForWALToBecomeAvailable() currentSource is set to
XLOG_FROM_ARCHIVE and it enters XLogFileReadAnyTLI() - first tries to
fetch WAL from archive and returns if succeeds otherwise tries to
fetch from pg_wal and returns if succeeds, otherwise returns with
failure.
If failure is returned from XLogFileReadAnyTLI(), change the
currentSource to XLOG_FROM_STREAM.
If a failure in XLOG_FROM_STREAM, the currentSource is set to
XLOG_FROM_ARCHIVE and XLogFileReadAnyTLI() is called again.

Note that the standby exits from this WaitForWALToBecomeAvailable()
state machine when the promotion signal is detected and before which
all the wal from archive -> pg_wal is exhausted.

Note that currentSource is set to XLOG_FROM_PG_WAL in
WaitForWALToBecomeAvailable() only after the server exits archive
recovery i.e. InArchiveRecovery is set to false in
FinishWalRecovery(). However, exhausting pg_wal for recovery is built
inherently within XLogFileReadAnyTLI().

In summary:
the flow when the standby is in crash recovery is pg_wal -> [archive
-> pg_wal -> stream] -> [archive -> pg_wal -> stream] -> [] -> [] ...
the flow when the standby is in archive recovery is [archive -> pg_wal
-> stream] -> [archive -> pg_wal -> stream] -> [] -> [] ...

The proposed patch makes the inherent state change to pg_wal after
failure to read from archive in XLogFileReadAnyTLI() to explicit by
setting currentSource to XLOG_FROM_PG_WAL in the state machine. I
think it doesn't alter the existing state machine or add any new extra
lookups in pg_wal.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Bharath Rupireddy
Date:
On Sat, Dec 31, 2022 at 12:03 AM Nathan Bossart
<nathandbossart@gmail.com> wrote:
>
> On Tue, Oct 18, 2022 at 12:01:07PM +0530, Bharath Rupireddy wrote:
> > The attached patch attempts to simplify the code a bit by changing the
> > current source to XLOG_FROM_PG_WAL after failing in
> > XLOG_FROM_PG_ARCHIVE so that the state machine can move smoothly to
> > read from pg_wal. And we can just pass the current source to
> > XLogFileReadAnyTLI(). It also enables us to reduce a bit of extra
> > XLogFileRead() code in XLogFileReadAnyTLI().
>
> This looks correct to me.  The only thing that stood out to me was the loop
> through 'tles' in XLogFileReadyAnyTLI.  With this change, we'd loop through
> the timelines for both XLOG_FROM_PG_ARCHIVE and XLOG_FROM_PG_WAL, whereas
> now we only loop through the timelines once.  However, I doubt this makes
> much difference in practice.  You'd only do the extra loop whenever
> restoring from the archives failed.

Right. With the patch, we'd loop again through the tles after a
failure from the archive. Since the curFileTLI isn't changed unless a
successful read, we'd read from pg_wal with tli where we earlier left
off reading from the archive. I'm not sure if this extra looping is
worth worrying about.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Nathan Bossart
Date:
On Tue, Jan 03, 2023 at 02:53:10PM +0530, Bharath Rupireddy wrote:
> In summary:
> the flow when the standby is in crash recovery is pg_wal -> [archive
> -> pg_wal -> stream] -> [archive -> pg_wal -> stream] -> [] -> [] ...
> the flow when the standby is in archive recovery is [archive -> pg_wal
> -> stream] -> [archive -> pg_wal -> stream] -> [] -> [] ...

This is my understanding as well.
 
> The proposed patch makes the inherent state change to pg_wal after
> failure to read from archive in XLogFileReadAnyTLI() to explicit by
> setting currentSource to XLOG_FROM_PG_WAL in the state machine. I
> think it doesn't alter the existing state machine or add any new extra
> lookups in pg_wal.

I'm assuming this change would simplify your other patch that modifieѕ
WaitForWALToBecomeAvailable() [0].  Is that correct?

[0] https://commitfest.postgresql.org/41/3663/

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Bharath Rupireddy
Date:
On Wed, Jan 4, 2023 at 12:03 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Jan 03, 2023 at 02:53:10PM +0530, Bharath Rupireddy wrote:
> > In summary:
> > the flow when the standby is in crash recovery is pg_wal -> [archive
> > -> pg_wal -> stream] -> [archive -> pg_wal -> stream] -> [] -> [] ...
> > the flow when the standby is in archive recovery is [archive -> pg_wal
> > -> stream] -> [archive -> pg_wal -> stream] -> [] -> [] ...
>
> This is my understanding as well.
>
> > The proposed patch makes the inherent state change to pg_wal after
> > failure to read from archive in XLogFileReadAnyTLI() to explicit by
> > setting currentSource to XLOG_FROM_PG_WAL in the state machine. I
> > think it doesn't alter the existing state machine or add any new extra
> > lookups in pg_wal.
>
> I'm assuming this change would simplify your other patch that modifieѕ
> WaitForWALToBecomeAvailable() [0].  Is that correct?
>
> [0] https://commitfest.postgresql.org/41/3663/

Yes, it does simplify the other feature patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Michael Paquier
Date:
On Tue, Jan 03, 2023 at 02:53:10PM +0530, Bharath Rupireddy wrote:
> The proposed patch makes the inherent state change to pg_wal after
> failure to read from archive in XLogFileReadAnyTLI() to explicit by
> setting currentSource to XLOG_FROM_PG_WAL in the state machine. I
> think it doesn't alter the existing state machine or add any new extra
> lookups in pg_wal.

Well, did you notice 4d894b41?  It introduced this change:

-               readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2, currentSource);
+               readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,
+                       currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY :
+                                        currentSource);

And this patch basically undoes that, meaning that we would basically
look at the archives first for all the expected TLIs, but only if no
files were found in pg_wal/.

The change is subtle, see XLogFileReadAnyTLI().  On HEAD we go through
each timeline listed and check both archives and then pg_wal/ after
the last source that failed was the archives.  The patch does
something different: it checks all the timelines for the archives,
then all the timelines in pg_wal/ with two separate calls to
XLogFileReadAnyTLI().
--
Michael

Attachment

Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Bharath Rupireddy
Date:
On Wed, Mar 1, 2023 at 1:46 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jan 03, 2023 at 02:53:10PM +0530, Bharath Rupireddy wrote:
> > The proposed patch makes the inherent state change to pg_wal after
> > failure to read from archive in XLogFileReadAnyTLI() to explicit by
> > setting currentSource to XLOG_FROM_PG_WAL in the state machine. I
> > think it doesn't alter the existing state machine or add any new extra
> > lookups in pg_wal.
>
> Well, did you notice 4d894b41?  It introduced this change:
>
> -               readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2, currentSource);
> +               readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,
> +                       currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY :
> +                                        currentSource);
>
> And this patch basically undoes that, meaning that we would basically
> look at the archives first for all the expected TLIs, but only if no
> files were found in pg_wal/.
>
> The change is subtle, see XLogFileReadAnyTLI().  On HEAD we go through
> each timeline listed and check both archives and then pg_wal/ after
> the last source that failed was the archives.  The patch does
> something different: it checks all the timelines for the archives,
> then all the timelines in pg_wal/ with two separate calls to
> XLogFileReadAnyTLI().

Thanks. Yeah, the patch proposed here just reverts that commit [1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4d894b41cd12179b710526eba9dc62c2b99abc4d.

That commit fixes an issue - "If there is a WAL segment with same ID
but different TLI present in both the WAL archive and pg_xlog, prefer
the one with higher TLI.".

I will withdraw the patch proposed in this thread.

[1]
commit 4d894b41cd12179b710526eba9dc62c2b99abc4d
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date:   Fri Feb 14 15:15:09 2014 +0200

    Change the order that pg_xlog and WAL archive are polled for WAL segments.

    If there is a WAL segment with same ID but different TLI present in both
    the WAL archive and pg_xlog, prefer the one with higher TLI. Before this
    patch, the archive was polled first, for all expected TLIs, and only if no
    file was found was pg_xlog scanned. This was a change in behavior from 9.3,
    which first scanned archive and pg_xlog for the highest TLI, then archive
    and pg_xlog for the next highest TLI and so forth. This patch reverts the
    behavior back to what it was in 9.2.

    The reason for this is that if for example you try to do archive recovery
    to timeline 2, which branched off timeline 1, but the WAL for timeline 2 is
    not archived yet, we would replay past the timeline switch point on
    timeline 1 using the archived files, before even looking timeline 2's files
    in pg_xlog

    Report and patch by Kyotaro Horiguchi. Backpatch to 9.3 where the behavior
    was changed.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Michael Paquier
Date:
On Fri, Mar 03, 2023 at 01:38:38PM +0530, Bharath Rupireddy wrote:
> I will withdraw the patch proposed in this thread.

Okay, thanks for confirming.
--
Michael

Attachment

Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Nathan Bossart
Date:
On Fri, Mar 03, 2023 at 01:38:38PM +0530, Bharath Rupireddy wrote:
> On Wed, Mar 1, 2023 at 1:46 PM Michael Paquier <michael@paquier.xyz> wrote:
>> Well, did you notice 4d894b41?  It introduced this change:
>>
>> -               readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2, currentSource);
>> +               readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,
>> +                       currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY :
>> +                                        currentSource);
>>
>> And this patch basically undoes that, meaning that we would basically
>> look at the archives first for all the expected TLIs, but only if no
>> files were found in pg_wal/.
>>
>> The change is subtle, see XLogFileReadAnyTLI().  On HEAD we go through
>> each timeline listed and check both archives and then pg_wal/ after
>> the last source that failed was the archives.  The patch does
>> something different: it checks all the timelines for the archives,
>> then all the timelines in pg_wal/ with two separate calls to
>> XLogFileReadAnyTLI().
> 
> Thanks. Yeah, the patch proposed here just reverts that commit [1]
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4d894b41cd12179b710526eba9dc62c2b99abc4d.
> 
> That commit fixes an issue - "If there is a WAL segment with same ID
> but different TLI present in both the WAL archive and pg_xlog, prefer
> the one with higher TLI.".

Given both Bharath and I missed this, perhaps we should add a comment about
this behavior.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Michael Paquier
Date:
On Fri, Mar 03, 2023 at 04:33:39PM -0800, Nathan Bossart wrote:
> Given both Bharath and I missed this, perhaps we should add a comment about
> this behavior.

Makes sense to me to document that in a better way.  What about the
addition of a short paragraph at the top of XLogFileReadAnyTLI() that
explains the behaviors we expect depending on the values of
XLogSource?  The case of XLOG_FROM_ANY with the order to scan the
archives then pg_wal/ for each timeline is the most important bit,
surely.
--
Michael

Attachment

Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Bharath Rupireddy
Date:
On Sat, Mar 4, 2023 at 8:00 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Mar 03, 2023 at 04:33:39PM -0800, Nathan Bossart wrote:
> > Given both Bharath and I missed this, perhaps we should add a comment about
> > this behavior.
>
> Makes sense to me to document that in a better way.  What about the
> addition of a short paragraph at the top of XLogFileReadAnyTLI() that
> explains the behaviors we expect depending on the values of
> XLogSource?  The case of XLOG_FROM_ANY with the order to scan the
> archives then pg_wal/ for each timeline is the most important bit,
> surely.

+1. I will send a patch in a bit.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Bharath Rupireddy
Date:
On Sat, Mar 4, 2023 at 8:14 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Sat, Mar 4, 2023 at 8:00 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Fri, Mar 03, 2023 at 04:33:39PM -0800, Nathan Bossart wrote:
> > > Given both Bharath and I missed this, perhaps we should add a comment about
> > > this behavior.
> >
> > Makes sense to me to document that in a better way.  What about the
> > addition of a short paragraph at the top of XLogFileReadAnyTLI() that
> > explains the behaviors we expect depending on the values of
> > XLogSource?  The case of XLOG_FROM_ANY with the order to scan the
> > archives then pg_wal/ for each timeline is the most important bit,
> > surely.
>
> +1. I will send a patch in a bit.

Okay, here's a patch attached.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Michael Paquier
Date:
On Sat, Mar 04, 2023 at 09:47:05AM +0530, Bharath Rupireddy wrote:
> Okay, here's a patch attached.

Thanks.

+ * When source == XLOG_FROM_ANY, this function first searches for the segment
+ * with a TLI in archive first, if not found, it searches in pg_wal. This way,
+ * if there is a WAL segment with same passed-in segno but different TLI
+ * present in both the archive and pg_wal, it prefers the one with higher TLI.
+ * The reason for this is that if for example we try to do archive recovery to
+ * timeline 2, which branched off timeline 1, but the WAL for timeline 2 is not
+ * archived yet, we would replay past the timeline switch point on timeline 1
+ * using the archived WAL segment, before even looking timeline 2's WAL
+ * segments in pg_wal.

This is pretty much what the commit has mentioned.  The first half
provides enough details, IMO.
--
Michael

Attachment

Re: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()

From
Bharath Rupireddy
Date:
On Mon, Mar 6, 2023 at 1:26 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sat, Mar 04, 2023 at 09:47:05AM +0530, Bharath Rupireddy wrote:
> > Okay, here's a patch attached.
>
> Thanks.
>
> + * When source == XLOG_FROM_ANY, this function first searches for the segment
> + * with a TLI in archive first, if not found, it searches in pg_wal. This way,
> + * if there is a WAL segment with same passed-in segno but different TLI
> + * present in both the archive and pg_wal, it prefers the one with higher TLI.
> + * The reason for this is that if for example we try to do archive recovery to
> + * timeline 2, which branched off timeline 1, but the WAL for timeline 2 is not
> + * archived yet, we would replay past the timeline switch point on timeline 1
> + * using the archived WAL segment, before even looking timeline 2's WAL
> + * segments in pg_wal.
>
> This is pretty much what the commit has mentioned.  The first half
> provides enough details, IMO.

IMO, mentioning the example from the commit message in the function
comment makes things more clear - one doesn't have to go look for the
commit message for that.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com