Thread: Switching XLog source from archive to streaming when primary available
Switching XLog source from archive to streaming when primary available
From
SATYANARAYANA NARLAPURAM
Date:
Hi Hackers,
When the standby couldn't connect to the primary it switches the XLog source from streaming to archive and continues in that state until it can get the WAL from the archive location. On a server with high WAL activity, typically getting the WAL from the archive is slower than streaming it from the primary and couldn't exit from that state. This not only increases the lag on the standby but also adversely impacts the primary as the WAL gets accumulated, and vacuum is not able to collect the dead tuples. DBAs as a mitigation can however remove/advance the slot or remove the restore_command on the standby but this is a manual work I am trying to avoid. I would like to propose the following, please let me know your thoughts.
- Automatically attempt to switch the source from Archive to streaming when the primary_conninfo is set after replaying 'N' wal segment governed by the GUC retry_primary_conn_after_wal_segments
- when retry_primary_conn_after_wal_segments is set to -1 then the feature is disabled
- When the retry attempt fails, then switch back to the archive
Thanks,
Satya
On Mon, Nov 29, 2021 at 1:30 AM SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote: > > Hi Hackers, > > When the standby couldn't connect to the primary it switches the XLog source from streaming to archive and continues inthat state until it can get the WAL from the archive location. On a server with high WAL activity, typically getting theWAL from the archive is slower than streaming it from the primary and couldn't exit from that state. This not only increasesthe lag on the standby but also adversely impacts the primary as the WAL gets accumulated, and vacuum is not ableto collect the dead tuples. DBAs as a mitigation can however remove/advance the slot or remove the restore_command onthe standby but this is a manual work I am trying to avoid. I would like to propose the following, please let me know yourthoughts. > > Automatically attempt to switch the source from Archive to streaming when the primary_conninfo is set after replaying 'N'wal segment governed by the GUC retry_primary_conn_after_wal_segments > when retry_primary_conn_after_wal_segments is set to -1 then the feature is disabled > When the retry attempt fails, then switch back to the archive I think there is another thread [1] that is logically trying to solve a similar issue, basically, in the main recovery apply loop is the walreceiver does not exist then it is launching the walreceiver. However, in that patch, it is not changing the current Xlog source but I think that is not a good idea because with that it will restore from the archive as well as stream from the primary so I have given that review comment on that thread as well. One big difference is that patch is launching the walreceiver even if the WAL is locally available and we don't really need more WAL but that is controlled by a GUC. [1] https://www.postgresql.org/message-id/CAKYtNApe05WmeRo92gTePEmhOM4myMpCK_%2BceSJtC7-AWLw1qw%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Mon, Nov 29, 2021 at 1:30 AM SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote: > > Hi Hackers, > > When the standby couldn't connect to the primary it switches the XLog source from streaming to archive and continues inthat state until it can get the WAL from the archive location. On a server with high WAL activity, typically getting theWAL from the archive is slower than streaming it from the primary and couldn't exit from that state. This not only increasesthe lag on the standby but also adversely impacts the primary as the WAL gets accumulated, and vacuum is not ableto collect the dead tuples. DBAs as a mitigation can however remove/advance the slot or remove the restore_command onthe standby but this is a manual work I am trying to avoid. I would like to propose the following, please let me know yourthoughts. > > Automatically attempt to switch the source from Archive to streaming when the primary_conninfo is set after replaying 'N'wal segment governed by the GUC retry_primary_conn_after_wal_segments > when retry_primary_conn_after_wal_segments is set to -1 then the feature is disabled > When the retry attempt fails, then switch back to the archive I've gone through the state machine in WaitForWALToBecomeAvailable and I understand it this way: failed to receive WAL records from the primary causes the current source to switch to archive and the standby continues to get WAL records from archive location unless some failure occurs there the current source is never going to switch back to stream. Given the fact that getting WAL from archive location causes delay in production environments, we miss to take the advantage of the reconnection to primary after previous failed attempt. So basically, we try to attempt to switch to streaming from archive (even though fetching from archive can succeed) after a certain amount of time or WAL segments. I prefer timing-based switch to streaming from archive instead of after a number of WAL segments fetched from archive. Right now, wal_retrieve_retry_interval is being used to wait before switching to archive after failed attempt from streaming, IMO, a similar GUC (that gets set once the source switched from streaming to archive and on timeout it switches to streaming again) can be used to switch from archive to streaming after the specified amount of time. Thoughts? Regards, Bharath Rupireddy.
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Sat, Apr 30, 2022 at 6:19 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Nov 29, 2021 at 1:30 AM SATYANARAYANA NARLAPURAM > <satyanarlapuram@gmail.com> wrote: > > > > Hi Hackers, > > > > When the standby couldn't connect to the primary it switches the XLog source from streaming to archive and continuesin that state until it can get the WAL from the archive location. On a server with high WAL activity, typicallygetting the WAL from the archive is slower than streaming it from the primary and couldn't exit from that state.This not only increases the lag on the standby but also adversely impacts the primary as the WAL gets accumulated,and vacuum is not able to collect the dead tuples. DBAs as a mitigation can however remove/advance the slot orremove the restore_command on the standby but this is a manual work I am trying to avoid. I would like to propose the following,please let me know your thoughts. > > > > Automatically attempt to switch the source from Archive to streaming when the primary_conninfo is set after replaying'N' wal segment governed by the GUC retry_primary_conn_after_wal_segments > > when retry_primary_conn_after_wal_segments is set to -1 then the feature is disabled > > When the retry attempt fails, then switch back to the archive > > I've gone through the state machine in WaitForWALToBecomeAvailable and > I understand it this way: failed to receive WAL records from the > primary causes the current source to switch to archive and the standby > continues to get WAL records from archive location unless some failure > occurs there the current source is never going to switch back to > stream. Given the fact that getting WAL from archive location causes > delay in production environments, we miss to take the advantage of the > reconnection to primary after previous failed attempt. > > So basically, we try to attempt to switch to streaming from archive > (even though fetching from archive can succeed) after a certain amount > of time or WAL segments. I prefer timing-based switch to streaming > from archive instead of after a number of WAL segments fetched from > archive. Right now, wal_retrieve_retry_interval is being used to wait > before switching to archive after failed attempt from streaming, IMO, > a similar GUC (that gets set once the source switched from streaming > to archive and on timeout it switches to streaming again) can be used > to switch from archive to streaming after the specified amount of > time. > > Thoughts? Here's a v1 patch that I've come up with. I'm right now using the existing GUC wal_retrieve_retry_interval to switch to stream mode from archive mode as opposed to switching only after the failure to get WAL from archive mode. If okay with the approach, I can add tests, change the docs and add a new GUC to control this behaviour. I'm open to thoughts and ideas here. Regards, Bharath Rupireddy.
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: not tested Hello I tested this patch in a setup where the standby is in the middle of replicating and REDOing primary's WAL files during avery large data insertion. During this time, I keep killing the walreceiver process to cause a stream failure and forcestandby to read from archive. The system will restore from archive for "wal_retrieve_retry_interval" seconds beforeit attempts to steam again. Without this patch, once the streaming is interrupted, it keeps reading from archive untilstandby reaches the same consistent state of primary and then it will switch back to streaming again. So it seems thatthe patch does the job as described and does bring some benefit during a very large REDO job where it will try to re-streamafter restoring some WALs from archive to speed up this "catch up" process. But if the recovery job is not a largeone, PG is already switching back to streaming once it hits consistent state. thank you Cary Huang HighGo Software Canada
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Sat, Jun 25, 2022 at 1:31 AM Cary Huang <cary.huang@highgo.ca> wrote: > > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: not tested > Documentation: not tested > > Hello > > I tested this patch in a setup where the standby is in the middle of replicating and REDOing primary's WAL files duringa very large data insertion. During this time, I keep killing the walreceiver process to cause a stream failure andforce standby to read from archive. The system will restore from archive for "wal_retrieve_retry_interval" seconds beforeit attempts to steam again. Without this patch, once the streaming is interrupted, it keeps reading from archive untilstandby reaches the same consistent state of primary and then it will switch back to streaming again. So it seems thatthe patch does the job as described and does bring some benefit during a very large REDO job where it will try to re-streamafter restoring some WALs from archive to speed up this "catch up" process. But if the recovery job is not a largeone, PG is already switching back to streaming once it hits consistent state. Thanks a lot Cary for testing the patch. > Here's a v1 patch that I've come up with. I'm right now using the > existing GUC wal_retrieve_retry_interval to switch to stream mode from > archive mode as opposed to switching only after the failure to get WAL > from archive mode. If okay with the approach, I can add tests, change > the docs and add a new GUC to control this behaviour. I'm open to > thoughts and ideas here. It will be great if I can hear some thoughts on the above points (as posted upthread). Regards, Bharath Rupireddy.
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Fri, Jul 8, 2022 at 9:16 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Sat, Jun 25, 2022 at 1:31 AM Cary Huang <cary.huang@highgo.ca> wrote: > > > > The following review has been posted through the commitfest application: > > make installcheck-world: tested, passed > > Implements feature: tested, passed > > Spec compliant: not tested > > Documentation: not tested > > > > Hello > > > > I tested this patch in a setup where the standby is in the middle of replicating and REDOing primary's WAL files duringa very large data insertion. During this time, I keep killing the walreceiver process to cause a stream failure andforce standby to read from archive. The system will restore from archive for "wal_retrieve_retry_interval" seconds beforeit attempts to steam again. Without this patch, once the streaming is interrupted, it keeps reading from archive untilstandby reaches the same consistent state of primary and then it will switch back to streaming again. So it seems thatthe patch does the job as described and does bring some benefit during a very large REDO job where it will try to re-streamafter restoring some WALs from archive to speed up this "catch up" process. But if the recovery job is not a largeone, PG is already switching back to streaming once it hits consistent state. > > Thanks a lot Cary for testing the patch. > > > Here's a v1 patch that I've come up with. I'm right now using the > > existing GUC wal_retrieve_retry_interval to switch to stream mode from > > archive mode as opposed to switching only after the failure to get WAL > > from archive mode. If okay with the approach, I can add tests, change > > the docs and add a new GUC to control this behaviour. I'm open to > > thoughts and ideas here. > > It will be great if I can hear some thoughts on the above points (as > posted upthread). Here's the v2 patch with a separate GUC, new GUC was necessary as the existing GUC wal_retrieve_retry_interval is loaded with multiple usages. When the feature is enabled, it will let standby to switch to stream mode i.e. fetch WAL from primary before even fetching from archive fails. The switching to stream mode from archive happens in 2 scenarios: 1) when standby is in initial recovery 2) when there was a failure in receiving from primary (walreceiver got killed or crashed or timed out, or connectivity to primary was broken - for whatever reasons). I also added test cases to the v2 patch. Please review the patch. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Nathan Bossart
Date:
+ <indexterm> + <primary><varname>wal_source_switch_interval</varname> configuration parameter</primary> + </indexterm> I don't want to bikeshed on the name too much, but I do think we need something more descriptive. I'm thinking of something like streaming_replication_attempt_interval or streaming_replication_retry_interval. + Specifies how long the standby server should wait before switching WAL + source from WAL archive to primary (streaming replication). This can + happen either during the standby initial recovery or after a previous + failed attempt to stream WAL from the primary. I'm not sure what the second sentence means. In general, I think the explanation in your commit message is much clearer: The standby makes an attempt to read WAL from primary after wal_retrieve_retry_interval milliseconds reading from archive. + If this value is specified without units, it is taken as milliseconds. + The default value is 5 seconds. A setting of <literal>0</literal> + disables the feature. 5 seconds seems low. I would expect the default to be 1-5 minutes. I think it's important to strike a balance between interrupting archive recovery to attempt streaming replication and letting archive recovery make progress. + * Try reading WAL from primary after every wal_source_switch_interval + * milliseconds, when state machine is in XLOG_FROM_ARCHIVE state. If + * successful, the state machine moves to XLOG_FROM_STREAM state, otherwise + * it falls back to XLOG_FROM_ARCHIVE state. It's not clear to me how this is expected to interact with the pg_wal phase of standby recovery. As the docs note [0], standby servers loop through archive recovery, recovery from pg_wal, and streaming replication. Does this cause the pg_wal phase to be skipped (i.e., the standby goes straight from archive recovery to streaming replication)? I wonder if it'd be better for this mechanism to simply move the standby to the pg_wal phase so that the usual ordering isn't changed. + if (!first_time && + TimestampDifferenceExceeds(last_switch_time, curr_time, + wal_source_switch_interval)) Shouldn't this also check that wal_source_switch_interval is not set to 0? + elog(DEBUG2, + "trying to switch WAL source to %s after fetching WAL from %s for %d milliseconds", + xlogSourceNames[XLOG_FROM_STREAM], + xlogSourceNames[currentSource], + wal_source_switch_interval); + + last_switch_time = curr_time; Shouldn't the last_switch_time be set when the state machine first enters XLOG_FROM_ARCHIVE? IIUC this logic is currently counting time spent elsewhere (e.g., XLOG_FROM_STREAM) when determining whether to force a source switch. This would mean that a standby that has spent a lot of time in streaming replication before failing would flip to XLOG_FROM_ARCHIVE, immediately flip back to XLOG_FROM_STREAM, and then likely flip back to XLOG_FROM_ARCHIVE when it failed again. Given the standby will wait for wal_retrieve_retry_interval before going back to XLOG_FROM_ARCHIVE, it seems like we could end up rapidly looping between sources. Perhaps I am misunderstanding how this is meant to work. + { + {"wal_source_switch_interval", PGC_SIGHUP, REPLICATION_STANDBY, + gettext_noop("Sets the time to wait before switching WAL source from archive to primary"), + gettext_noop("0 turns this feature off."), + GUC_UNIT_MS + }, + &wal_source_switch_interval, + 5000, 0, INT_MAX, + NULL, NULL, NULL + }, I wonder if the lower bound should be higher to avoid switching unnecessarily rapidly between WAL sources. I see that WaitForWALToBecomeAvailable() ensures that standbys do not switch from XLOG_FROM_STREAM to XLOG_FROM_ARCHIVE more often than once per wal_retrieve_retry_interval. Perhaps wal_retrieve_retry_interval should be the lower bound for this GUC, too. Or maybe WaitForWALToBecomeAvailable() should make sure that the standby makes at least once attempt to restore the file from archive before switching to streaming replication. [0] https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-OPERATION -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Wed, Sep 7, 2022 at 3:27 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > + <indexterm> > + <primary><varname>wal_source_switch_interval</varname> configuration parameter</primary> > + </indexterm> > > I don't want to bikeshed on the name too much, but I do think we need > something more descriptive. I'm thinking of something like > streaming_replication_attempt_interval or > streaming_replication_retry_interval. I could come up with wal_source_switch_interval after a log of bikeshedding myself :). However, streaming_replication_retry_interval looks much better, I've used it in the latest patch. Thanks. > + Specifies how long the standby server should wait before switching WAL > + source from WAL archive to primary (streaming replication). This can > + happen either during the standby initial recovery or after a previous > + failed attempt to stream WAL from the primary. > > I'm not sure what the second sentence means. In general, I think the > explanation in your commit message is much clearer: I polished the comments, docs and commit message a bit, please check now. > 5 seconds seems low. I would expect the default to be 1-5 minutes. I > think it's important to strike a balance between interrupting archive > recovery to attempt streaming replication and letting archive recovery make > progress. +1 for a default value of 5 minutes to avoid frequent interruptions for archive mode when primary is really down for a long time. I've also added a cautionary note in the docs about the lower values. > + * Try reading WAL from primary after every wal_source_switch_interval > + * milliseconds, when state machine is in XLOG_FROM_ARCHIVE state. If > + * successful, the state machine moves to XLOG_FROM_STREAM state, otherwise > + * it falls back to XLOG_FROM_ARCHIVE state. > > It's not clear to me how this is expected to interact with the pg_wal phase > of standby recovery. As the docs note [0], standby servers loop through > archive recovery, recovery from pg_wal, and streaming replication. Does > this cause the pg_wal phase to be skipped (i.e., the standby goes straight > from archive recovery to streaming replication)? I wonder if it'd be > better for this mechanism to simply move the standby to the pg_wal phase so > that the usual ordering isn't changed. > > [0] https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-OPERATION It doesn't change any behaviour as such for XLOG_FROM_PG_WAL. In standby mode when recovery_command is specified, the initial value of currentSource would be XLOG_FROM_ARCHIVE (see [1]). If the archive is exhausted of WAL or the standby fails to fetch from the archive, then it switches to XLOG_FROM_STREAM. If the standby fails to receive WAL from primary, it switches back to XLOG_FROM_ARCHIVE. This continues unless the standby gets promoted. With the patch, we enable the standby to try fetching from the primary, instead of waiting for WAL in the archive to get exhausted or for an error to occur in the standby while receiving from the archive. > + if (!first_time && > + TimestampDifferenceExceeds(last_switch_time, curr_time, > + wal_source_switch_interval)) > > Shouldn't this also check that wal_source_switch_interval is not set to 0? Corrected. > + elog(DEBUG2, > + "trying to switch WAL source to %s after fetching WAL from %sfor %d milliseconds", > + xlogSourceNames[XLOG_FROM_STREAM], > + xlogSourceNames[currentSource], > + wal_source_switch_interval); > + > + last_switch_time = curr_time; > > Shouldn't the last_switch_time be set when the state machine first enters > XLOG_FROM_ARCHIVE? IIUC this logic is currently counting time spent > elsewhere (e.g., XLOG_FROM_STREAM) when determining whether to force a > source switch. This would mean that a standby that has spent a lot of time > in streaming replication before failing would flip to XLOG_FROM_ARCHIVE, > immediately flip back to XLOG_FROM_STREAM, and then likely flip back to > XLOG_FROM_ARCHIVE when it failed again. Given the standby will wait for > wal_retrieve_retry_interval before going back to XLOG_FROM_ARCHIVE, it > seems like we could end up rapidly looping between sources. Perhaps I am > misunderstanding how this is meant to work. last_switch_time indicates the time when the standby last attempted to switch to primary. For instance, a standby: 1) for the first time, sets last_switch_time = current_time when in archive mode 2) if current_time < last_switch_time + interval, continues to be in archive mode 3) if current_time >= last_switch_time + interval, attempts to switch to primary and sets last_switch_time = current_time 3.1) if successfully switches to primary, continues in there and for any reason fails to fetch from primary, then enters archive mode and loops from step (2) 3.2) if fails to switch to primary, then enters archive mode and loops from step (2) Hope this clarifies the behaviour. > + { > + {"wal_source_switch_interval", PGC_SIGHUP, REPLICATION_STANDBY, > + gettext_noop("Sets the time to wait before switching WAL source from archive to primary"), > + gettext_noop("0 turns this feature off."), > + GUC_UNIT_MS > + }, > + &wal_source_switch_interval, > + 5000, 0, INT_MAX, > + NULL, NULL, NULL > + }, > > I wonder if the lower bound should be higher to avoid switching > unnecessarily rapidly between WAL sources. I see that > WaitForWALToBecomeAvailable() ensures that standbys do not switch from > XLOG_FROM_STREAM to XLOG_FROM_ARCHIVE more often than once per > wal_retrieve_retry_interval. Perhaps wal_retrieve_retry_interval should be > the lower bound for this GUC, too. Or maybe WaitForWALToBecomeAvailable() > should make sure that the standby makes at least once attempt to restore > the file from archive before switching to streaming replication. No, we need a way to disable the feature, so I'm not changing the lower bound. And let's not make this GUC dependent on any other GUC, I would like to keep it simple for better usability. However, I've increased the default value to 5min and added a note in the docs about the lower values. I'm attaching the v3 patch with the review comments addressed, please review it further. [1] if (!InArchiveRecovery) currentSource = XLOG_FROM_PG_WAL; else if (currentSource == XLOG_FROM_ANY || (!StandbyMode && currentSource == XLOG_FROM_STREAM)) { lastSourceFailed = false; currentSource = XLOG_FROM_ARCHIVE; } -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Nathan Bossart
Date:
On Thu, Sep 08, 2022 at 05:16:53PM +0530, Bharath Rupireddy wrote: > On Wed, Sep 7, 2022 at 3:27 AM Nathan Bossart <nathandbossart@gmail.com> wrote: >> It's not clear to me how this is expected to interact with the pg_wal phase >> of standby recovery. As the docs note [0], standby servers loop through >> archive recovery, recovery from pg_wal, and streaming replication. Does >> this cause the pg_wal phase to be skipped (i.e., the standby goes straight >> from archive recovery to streaming replication)? I wonder if it'd be >> better for this mechanism to simply move the standby to the pg_wal phase so >> that the usual ordering isn't changed. > > It doesn't change any behaviour as such for XLOG_FROM_PG_WAL. In > standby mode when recovery_command is specified, the initial value of > currentSource would be XLOG_FROM_ARCHIVE (see [1]). If the archive is > exhausted of WAL or the standby fails to fetch from the archive, then > it switches to XLOG_FROM_STREAM. If the standby fails to receive WAL > from primary, it switches back to XLOG_FROM_ARCHIVE. This continues > unless the standby gets promoted. With the patch, we enable the > standby to try fetching from the primary, instead of waiting for WAL > in the archive to get exhausted or for an error to occur in the > standby while receiving from the archive. Okay. I see that you are checking for XLOG_FROM_ARCHIVE. >> Shouldn't the last_switch_time be set when the state machine first enters >> XLOG_FROM_ARCHIVE? IIUC this logic is currently counting time spent >> elsewhere (e.g., XLOG_FROM_STREAM) when determining whether to force a >> source switch. This would mean that a standby that has spent a lot of time >> in streaming replication before failing would flip to XLOG_FROM_ARCHIVE, >> immediately flip back to XLOG_FROM_STREAM, and then likely flip back to >> XLOG_FROM_ARCHIVE when it failed again. Given the standby will wait for >> wal_retrieve_retry_interval before going back to XLOG_FROM_ARCHIVE, it >> seems like we could end up rapidly looping between sources. Perhaps I am >> misunderstanding how this is meant to work. > > last_switch_time indicates the time when the standby last attempted to > switch to primary. For instance, a standby: > 1) for the first time, sets last_switch_time = current_time when in archive mode > 2) if current_time < last_switch_time + interval, continues to be in > archive mode > 3) if current_time >= last_switch_time + interval, attempts to switch > to primary and sets last_switch_time = current_time > 3.1) if successfully switches to primary, continues in there and for > any reason fails to fetch from primary, then enters archive mode and > loops from step (2) > 3.2) if fails to switch to primary, then enters archive mode and loops > from step (2) Let's say I have this new parameter set to 5 minutes, and I have a standby that's been at step 3.1 for 5 days before failing and going back to step 2. Won't the standby immediately jump back to step 3.1? I think we should place the limit on how long the server stays in XLOG_FROM_ARCHIVE, not how long it's been since we last tried XLOG_FROM_STREAM. >> I wonder if the lower bound should be higher to avoid switching >> unnecessarily rapidly between WAL sources. I see that >> WaitForWALToBecomeAvailable() ensures that standbys do not switch from >> XLOG_FROM_STREAM to XLOG_FROM_ARCHIVE more often than once per >> wal_retrieve_retry_interval. Perhaps wal_retrieve_retry_interval should be >> the lower bound for this GUC, too. Or maybe WaitForWALToBecomeAvailable() >> should make sure that the standby makes at least once attempt to restore >> the file from archive before switching to streaming replication. > > No, we need a way to disable the feature, so I'm not changing the > lower bound. And let's not make this GUC dependent on any other GUC, I > would like to keep it simple for better usability. However, I've > increased the default value to 5min and added a note in the docs about > the lower values. > > I'm attaching the v3 patch with the review comments addressed, please > review it further. My general point is that we should probably offer some basic preventative measure against flipping back and forth between streaming and archive recovery while making zero progress. As I noted, maybe that's as simple as having WaitForWALToBecomeAvailable() attempt to restore a file from archive at least once before the new parameter forces us to switch to streaming replication. There might be other ways to handle this. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Kyotaro Horiguchi
Date:
Being late for the party. It seems to me that the function is getting too long. I think we might want to move the core part of the patch into another function. I think it might be better if intentionalSourceSwitch doesn't need lastSourceFailed set. It would look like this: > if (lastSourceFailed || switchSource) > { > if (nonblocking && lastSourceFailed) > return XLREAD_WOULDBLOCK; + if (first_time) + last_switch_time = curr_time; .. + if (!first_time && + TimestampDifferenceExceeds(last_switch_time, curr_time, .. + /* We're not here for the first time any more */ + if (first_time) + first_time = false; I don't think the flag first_time is needed. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Switching XLog source from archive to streaming when primary available
From
Kyotaro Horiguchi
Date:
At Thu, 8 Sep 2022 10:53:56 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in > On Thu, Sep 08, 2022 at 05:16:53PM +0530, Bharath Rupireddy wrote: > > I'm attaching the v3 patch with the review comments addressed, please > > review it further. > > My general point is that we should probably offer some basic preventative > measure against flipping back and forth between streaming and archive > recovery while making zero progress. As I noted, maybe that's as simple as > having WaitForWALToBecomeAvailable() attempt to restore a file from archive > at least once before the new parameter forces us to switch to streaming > replication. There might be other ways to handle this. +1. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Fri, Sep 9, 2022 at 10:57 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Thu, 8 Sep 2022 10:53:56 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in > > On Thu, Sep 08, 2022 at 05:16:53PM +0530, Bharath Rupireddy wrote: > > > I'm attaching the v3 patch with the review comments addressed, please > > > review it further. > > > > My general point is that we should probably offer some basic preventative > > measure against flipping back and forth between streaming and archive > > recovery while making zero progress. As I noted, maybe that's as simple as > > having WaitForWALToBecomeAvailable() attempt to restore a file from archive > > at least once before the new parameter forces us to switch to streaming > > replication. There might be other ways to handle this. > > +1. Hm. In that case, I think we can get rid of timeout based switching mechanism and have this behaviour - the standby can attempt to switch to streaming mode from archive, say, after fetching 1, 2 or a configurable number of WAL files. In fact, this is the original idea proposed by Satya in this thread. If okay, I can code on that. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Nathan Bossart
Date:
On Fri, Sep 09, 2022 at 12:14:25PM +0530, Bharath Rupireddy wrote: > On Fri, Sep 9, 2022 at 10:57 AM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: >> At Thu, 8 Sep 2022 10:53:56 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in >> > My general point is that we should probably offer some basic preventative >> > measure against flipping back and forth between streaming and archive >> > recovery while making zero progress. As I noted, maybe that's as simple as >> > having WaitForWALToBecomeAvailable() attempt to restore a file from archive >> > at least once before the new parameter forces us to switch to streaming >> > replication. There might be other ways to handle this. >> >> +1. > > Hm. In that case, I think we can get rid of timeout based switching > mechanism and have this behaviour - the standby can attempt to switch > to streaming mode from archive, say, after fetching 1, 2 or a > configurable number of WAL files. In fact, this is the original idea > proposed by Satya in this thread. IMO the timeout approach would be more intuitive for users. When it comes to archive recovery, "WAL segment" isn't a standard unit of measure. WAL segment size can differ between clusters, and WAL files can have different amounts of data or take different amounts of time to replay. So I think it would be difficult for the end user to decide on a value. However, even the timeout approach has this sort of problem. If your parameter is set to 1 minute, but the current archive takes 5 minutes to recover, you won't really be testing streaming replication once a minute. That would likely need to be documented. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Fri, Sep 9, 2022 at 10:29 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Fri, Sep 09, 2022 at 12:14:25PM +0530, Bharath Rupireddy wrote: > > On Fri, Sep 9, 2022 at 10:57 AM Kyotaro Horiguchi > > <horikyota.ntt@gmail.com> wrote: > >> At Thu, 8 Sep 2022 10:53:56 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in > >> > My general point is that we should probably offer some basic preventative > >> > measure against flipping back and forth between streaming and archive > >> > recovery while making zero progress. As I noted, maybe that's as simple as > >> > having WaitForWALToBecomeAvailable() attempt to restore a file from archive > >> > at least once before the new parameter forces us to switch to streaming > >> > replication. There might be other ways to handle this. > >> > >> +1. > > > > Hm. In that case, I think we can get rid of timeout based switching > > mechanism and have this behaviour - the standby can attempt to switch > > to streaming mode from archive, say, after fetching 1, 2 or a > > configurable number of WAL files. In fact, this is the original idea > > proposed by Satya in this thread. > > IMO the timeout approach would be more intuitive for users. When it comes > to archive recovery, "WAL segment" isn't a standard unit of measure. WAL > segment size can differ between clusters, and WAL files can have different > amounts of data or take different amounts of time to replay. How about the amount of WAL bytes fetched from the archive after which a standby attempts to connect to primary or enter streaming mode? Of late, we've changed some GUCs to represent bytes instead of WAL files/segments, see [1]. > So I think it > would be difficult for the end user to decide on a value. However, even > the timeout approach has this sort of problem. If your parameter is set to > 1 minute, but the current archive takes 5 minutes to recover, you won't > really be testing streaming replication once a minute. That would likely > need to be documented. If we have configurable WAL bytes instead of timeout for standby WAL source switch from archive to primary, we don't have the above problem right? [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c3fe108c025e4a080315562d4c15ecbe3f00405e -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Nathan Bossart
Date:
On Fri, Sep 09, 2022 at 11:07:00PM +0530, Bharath Rupireddy wrote: > On Fri, Sep 9, 2022 at 10:29 PM Nathan Bossart <nathandbossart@gmail.com> wrote: >> IMO the timeout approach would be more intuitive for users. When it comes >> to archive recovery, "WAL segment" isn't a standard unit of measure. WAL >> segment size can differ between clusters, and WAL files can have different >> amounts of data or take different amounts of time to replay. > > How about the amount of WAL bytes fetched from the archive after which > a standby attempts to connect to primary or enter streaming mode? Of > late, we've changed some GUCs to represent bytes instead of WAL > files/segments, see [1]. Well, for wal_keep_size, using bytes makes sense. Given you know how much disk space you have, you can set this parameter accordingly to avoid retaining too much of it for standby servers. For your proposed parameter, it's not so simple. The same setting could have wildly different timing behavior depending on the server. I still think that a timeout is the most intuitive. >> So I think it >> would be difficult for the end user to decide on a value. However, even >> the timeout approach has this sort of problem. If your parameter is set to >> 1 minute, but the current archive takes 5 minutes to recover, you won't >> really be testing streaming replication once a minute. That would likely >> need to be documented. > > If we have configurable WAL bytes instead of timeout for standby WAL > source switch from archive to primary, we don't have the above problem > right? If you are going to stop replaying in the middle of a WAL archive, then maybe. But I don't think I'd recommend that. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Sat, Sep 10, 2022 at 3:35 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > Well, for wal_keep_size, using bytes makes sense. Given you know how much > disk space you have, you can set this parameter accordingly to avoid > retaining too much of it for standby servers. For your proposed parameter, > it's not so simple. The same setting could have wildly different timing > behavior depending on the server. I still think that a timeout is the most > intuitive. Hm. In v3 patch, I've used the timeout approach, but tracking the duration server spends in XLOG_FROM_ARCHIVE as opposed to tracking last failed time in streaming from primary. > So I think it > would be difficult for the end user to decide on a value. However, even > the timeout approach has this sort of problem. If your parameter is set to > 1 minute, but the current archive takes 5 minutes to recover, you won't > really be testing streaming replication once a minute. That would likely > need to be documented. Added a note in the docs. On Fri, Sep 9, 2022 at 10:46 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > Being late for the party. Thanks for reviewing this. > It seems to me that the function is getting too long. I think we > might want to move the core part of the patch into another function. Yeah, the WaitForWALToBecomeAvailable() (without this patch) has around 460 LOC out of which WAL fetching from the chosen source is of 240 LOC, IMO, this code will be a candidate for a new function. I think that part can be discussed separately. Having said that, I moved the new code to a new function. > I think it might be better if intentionalSourceSwitch doesn't need > lastSourceFailed set. It would look like this: > > > if (lastSourceFailed || switchSource) > > { > > if (nonblocking && lastSourceFailed) > > return XLREAD_WOULDBLOCK; I think the above looks good, done that way in the latest patch. > I don't think the flag first_time is needed. Addressed this in the v4 patch. Please review the attached v4 patch addressing above review comments. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Mon, Sep 12, 2022 at 9:03 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Please review the attached v4 patch addressing above review comments. Oops, there's a compiler warning [1] with the v4 patch, fixed it. Please review the attached v5 patch. [1] https://cirrus-ci.com/task/5730076611313664?logs=gcc_warning#L450 -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Mon, Sep 12, 2022 at 11:56 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Please review the attached v5 patch. I'm attaching the v6 patch that's rebased on to the latest HEAD. Please consider this for review. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Kyotaro Horiguchi
Date:
At Thu, 15 Sep 2022 10:28:12 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > I'm attaching the v6 patch that's rebased on to the latest HEAD. > Please consider this for review. Thaks for the new version! +#define StreamingReplRetryEnabled() \ + (streaming_replication_retry_interval > 0 && \ + StandbyMode && \ + currentSource == XLOG_FROM_ARCHIVE) It seems to me a bit too complex.. + /* Save the timestamp at which we're switching to archive. */ + if (StreamingReplRetryEnabled()) + switched_to_archive_at = GetCurrentTimestamp(); Anyway we are going to open a file just after this so GetCurrentTimestamp() cannot cause a perceptible degradation. Coulnd't we do that unconditionally, to get rid of the macro? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Thu, Sep 15, 2022 at 1:52 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Thu, 15 Sep 2022 10:28:12 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > > I'm attaching the v6 patch that's rebased on to the latest HEAD. > > Please consider this for review. > > Thaks for the new version! > > +#define StreamingReplRetryEnabled() \ > + (streaming_replication_retry_interval > 0 && \ > + StandbyMode && \ > + currentSource == XLOG_FROM_ARCHIVE) > > It seems to me a bit too complex.. I don't think so, it just tells whether a standby is allowed to switch source to stream from archive. > + /* Save the timestamp at which we're switching to archive. */ > + if (StreamingReplRetryEnabled()) > + switched_to_archive_at = GetCurrentTimestamp(); > > Anyway we are going to open a file just after this so > GetCurrentTimestamp() cannot cause a perceptible degradation. > Coulnd't we do that unconditionally, to get rid of the macro? Do we really need to do it unconditionally? I don't think so. And, we can't get rid of the macro, as we need to check for the current source, GUC and standby mode. When this feature is disabled, it mustn't execute any extra code IMO. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Kyotaro Horiguchi
Date:
At Fri, 16 Sep 2022 09:15:58 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Thu, Sep 15, 2022 at 1:52 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > At Thu, 15 Sep 2022 10:28:12 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > > > I'm attaching the v6 patch that's rebased on to the latest HEAD. > > > Please consider this for review. > > > > Thaks for the new version! > > > > +#define StreamingReplRetryEnabled() \ > > + (streaming_replication_retry_interval > 0 && \ > > + StandbyMode && \ > > + currentSource == XLOG_FROM_ARCHIVE) > > > > It seems to me a bit too complex.. In other words, it seems to me that the macro name doesn't manifest the condition correctly. > I don't think so, it just tells whether a standby is allowed to switch > source to stream from archive. > > > + /* Save the timestamp at which we're switching to archive. */ > > + if (StreamingReplRetryEnabled()) > > + switched_to_archive_at = GetCurrentTimestamp(); > > > > Anyway we are going to open a file just after this so > > GetCurrentTimestamp() cannot cause a perceptible degradation. > > Coulnd't we do that unconditionally, to get rid of the macro? > > Do we really need to do it unconditionally? I don't think so. And, we > can't get rid of the macro, as we need to check for the current > source, GUC and standby mode. When this feature is disabled, it > mustn't execute any extra code IMO. I don't think we don't particularly want to do that unconditionally. I wanted just to get rid of the macro from the usage site. Even if the same condition is used elsewhere, I see it better to write out the condition directly there.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Fri, Sep 16, 2022 at 12:06 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > In other words, it seems to me that the macro name doesn't manifest > the condition correctly. > > I don't think we don't particularly want to do that unconditionally. > I wanted just to get rid of the macro from the usage site. Even if > the same condition is used elsewhere, I see it better to write out the > condition directly there.. I wanted to avoid a bit of duplicate code there. How about naming that macro IsXLOGSourceSwitchToStreamEnabled() or SwitchFromArchiveToStreamEnabled() or just SwitchFromArchiveToStream() or any other better name? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Fri, Sep 16, 2022 at 4:58 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Fri, Sep 16, 2022 at 12:06 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > In other words, it seems to me that the macro name doesn't manifest > > the condition correctly. > > > > I don't think we don't particularly want to do that unconditionally. > > I wanted just to get rid of the macro from the usage site. Even if > > the same condition is used elsewhere, I see it better to write out the > > condition directly there.. > > I wanted to avoid a bit of duplicate code there. How about naming that > macro IsXLOGSourceSwitchToStreamEnabled() or > SwitchFromArchiveToStreamEnabled() or just SwitchFromArchiveToStream() > or any other better name? SwitchFromArchiveToStreamEnabled() seemed better at this point. I'm attaching the v7 patch with that change. Please review it further. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Nathan Bossart
Date:
On Mon, Sep 19, 2022 at 07:49:21PM +0530, Bharath Rupireddy wrote: > SwitchFromArchiveToStreamEnabled() seemed better at this point. I'm > attaching the v7 patch with that change. Please review it further. As I mentioned upthread [0], I'm still a little concerned that this patch will cause the state machine to go straight from archive recovery to streaming replication, skipping recovery from pg_wal. I wonder if this could be resolved by moving the standby to the pg_wal phase instead. Concretely, this line + if (switchSource) + break; would instead change currentSource from XLOG_FROM_ARCHIVE to XLOG_FROM_PG_WAL before the call to XLogFileReadAnyTLI(). I suspect the behavior would be basically the same, but it would maintain the existing ordering. However, I do see the following note elsewhere in xlogrecovery.c: * The segment can be fetched via restore_command, or via walreceiver having * streamed the record, or it can already be present in pg_wal. Checking * pg_wal is mainly for crash recovery, but it will be polled in standby mode * too, in case someone copies a new segment directly to pg_wal. That is not * documented or recommended, though. Given this information, the present behavior might not be too important, but I don't see a point in changing it without good reason. [0] https://postgr.es/m/20220906215704.GA2084086%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Sun, Oct 9, 2022 at 3:22 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > As I mentioned upthread [0], I'm still a little concerned that this patch > will cause the state machine to go straight from archive recovery to > streaming replication, skipping recovery from pg_wal. > > [0] https://postgr.es/m/20220906215704.GA2084086%40nathanxps13 Yes, it goes straight to streaming replication skipping recovery from pg_wal with the patch. > I wonder if this > could be resolved by moving the standby to the pg_wal phase instead. > Concretely, this line > > + if (switchSource) > + break; > > would instead change currentSource from XLOG_FROM_ARCHIVE to > XLOG_FROM_PG_WAL before the call to XLogFileReadAnyTLI(). I suspect the > behavior would be basically the same, but it would maintain the existing > ordering. We can give it a chance to restore from pg_wal before switching to streaming to not change any behaviour of the state machine. But, not definitely by setting currentSource to XLOG_FROM_WAL, we basically never explicitly set currentSource to XLOG_FROM_WAL, other than when not in archive recovery i.e. InArchiveRecovery is false. Also, see the comment [1]. Instead, the simplest would be to just pass XLOG_FROM_WAL to XLogFileReadAnyTLI() when we're about to switch the source to stream mode. This doesn't change the existing behaviour. > However, I do see the following note elsewhere in xlogrecovery.c: > > * The segment can be fetched via restore_command, or via walreceiver having > * streamed the record, or it can already be present in pg_wal. Checking > * pg_wal is mainly for crash recovery, but it will be polled in standby mode > * too, in case someone copies a new segment directly to pg_wal. That is not > * documented or recommended, though. > > Given this information, the present behavior might not be too important, > but I don't see a point in changing it without good reason. Yeah, with the attached patch we don't skip pg_wal before switching to streaming mode. I've also added a note in the 'Standby Server Operation' section about the new feature. Please review the v8 patch further. Unrelated to this patch, the fact that the standby polls pg_wal is not documented or recommended, is not true, it is actually documented [2]. Whether or not we change the docs to be something like [3], is a separate discussion. [1] /* * We just successfully read a file in pg_wal. We prefer files in * the archive over ones in pg_wal, so try the next file again * from the archive first. */ [2] https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-OPERATION The standby server will also attempt to restore any WAL found in the standby cluster's pg_wal directory. That typically happens after a server restart, when the standby replays again WAL that was streamed from the primary before the restart, but you can also manually copy files to pg_wal at any time to have them replayed. [3] The standby server will also attempt to restore any WAL found in the standby cluster's pg_wal directory. That typically happens after a server restart, when the standby replays again WAL that was streamed from the primary before the restart, but you can also manually copy files to pg_wal at any time to have them replayed. However, copying of WAL files manually is not recommended. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Nathan Bossart
Date:
On Sun, Oct 09, 2022 at 02:39:47PM +0530, Bharath Rupireddy wrote: > We can give it a chance to restore from pg_wal before switching to > streaming to not change any behaviour of the state machine. But, not > definitely by setting currentSource to XLOG_FROM_WAL, we basically > never explicitly set currentSource to XLOG_FROM_WAL, other than when > not in archive recovery i.e. InArchiveRecovery is false. Also, see the > comment [1]. > > Instead, the simplest would be to just pass XLOG_FROM_WAL to > XLogFileReadAnyTLI() when we're about to switch the source to stream > mode. This doesn't change the existing behaviour. It might be more consistent with existing behavior, but one thing I hadn't considered is that it might make your proposed feature ineffective when users are copying files straight into pg_wal. IIUC as long as the files are present in pg_wal, the source-switch logic won't kick in. > Unrelated to this patch, the fact that the standby polls pg_wal is not > documented or recommended, is not true, it is actually documented [2]. > Whether or not we change the docs to be something like [3], is a > separate discussion. I wonder if it would be better to simply remove this extra polling of pg_wal as a prerequisite to your patch. The existing commentary leads me to think there might not be a strong reason for this behavior, so it could be a nice way to simplify your patch. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Mon, Oct 10, 2022 at 3:17 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > > Instead, the simplest would be to just pass XLOG_FROM_WAL to > > XLogFileReadAnyTLI() when we're about to switch the source to stream > > mode. This doesn't change the existing behaviour. > > It might be more consistent with existing behavior, but one thing I hadn't > considered is that it might make your proposed feature ineffective when > users are copying files straight into pg_wal. IIUC as long as the files > are present in pg_wal, the source-switch logic won't kick in. It happens even now, that is, the server will not switch to streaming mode from the archive after a failure if there's someone continuously copying WAL files to the pg_wal directory. I have not personally seen anyone or any service doing that. It doesn't mean that can't happen. They might do it for some purpose such as 1) to bring back in sync quickly a standby that's lagging behind the primary after the archive connection and/or streaming replication connection are/is broken but many WAL files leftover on the primary 2) before promoting a standby that's lagging behind the primary for failover or other purposes. However, I'm not sure if someone does these things on production servers. > > Unrelated to this patch, the fact that the standby polls pg_wal is not > > documented or recommended, is not true, it is actually documented [2]. > > Whether or not we change the docs to be something like [3], is a > > separate discussion. > > I wonder if it would be better to simply remove this extra polling of > pg_wal as a prerequisite to your patch. The existing commentary leads me > to think there might not be a strong reason for this behavior, so it could > be a nice way to simplify your patch. I don't think it's a good idea to remove that completely. As said above, it might help someone, we never know. I think for this feature, we just need to decide on whether or not we'd allow pg_wal polling before switching to streaming mode. If we allow it like in the v8 patch, we can document the behavior. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Nathan Bossart
Date:
On Mon, Oct 10, 2022 at 11:33:57AM +0530, Bharath Rupireddy wrote: > On Mon, Oct 10, 2022 at 3:17 AM Nathan Bossart <nathandbossart@gmail.com> wrote: >> I wonder if it would be better to simply remove this extra polling of >> pg_wal as a prerequisite to your patch. The existing commentary leads me >> to think there might not be a strong reason for this behavior, so it could >> be a nice way to simplify your patch. > > I don't think it's a good idea to remove that completely. As said > above, it might help someone, we never know. It would be great to hear whether anyone is using this functionality. If no one is aware of existing usage and there is no interest in keeping it around, I don't think it would be unreasonable to remove it in v16. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Tue, Oct 11, 2022 at 8:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Mon, Oct 10, 2022 at 11:33:57AM +0530, Bharath Rupireddy wrote: > > On Mon, Oct 10, 2022 at 3:17 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > >> I wonder if it would be better to simply remove this extra polling of > >> pg_wal as a prerequisite to your patch. The existing commentary leads me > >> to think there might not be a strong reason for this behavior, so it could > >> be a nice way to simplify your patch. > > > > I don't think it's a good idea to remove that completely. As said > > above, it might help someone, we never know. > > It would be great to hear whether anyone is using this functionality. If > no one is aware of existing usage and there is no interest in keeping it > around, I don't think it would be unreasonable to remove it in v16. It seems like exhausting all the WAL in pg_wal before switching to streaming after failing to fetch from archive is unremovable. I found this after experimenting with it, here are my findings: 1. The standby has to recover initial WAL files in the pg_wal directory even for the normal post-restart/first-time-start case, I mean, in non-crash recovery case. 2. The standby received WAL files from primary (walreceiver just writes and flushes the received WAL to WAL files under pg_wal) pretty-fast and/or standby recovery is slow, say both the standby connection to primary and archive connection are broken for whatever reasons, then it has WAL files to recover in pg_wal directory. I think the fundamental behaviour for the standy is that it has to fully recover to the end of WAL under pg_wal no matter who copies WAL files there. I fully understand the consequences of manually copying WAL files into pg_wal, for that matter, manually copying/tinkering any other files into/under the data directory is something we don't recommend and encourage. In summary, the standby state machine in WaitForWALToBecomeAvailable() exhausts all the WAL in pg_wal before switching to streaming after failing to fetch from archive. The v8 patch proposed upthread deviates from this behaviour. Hence, attaching v9 patch that keeps the behaviour as-is, that means, the standby exhausts all the WAL in pg_wal before switching to streaming after fetching WAL from archive for at least streaming_replication_retry_interval milliseconds. Please review the v9 patch further. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Ian Lawrence Barwick
Date:
2022年10月18日(火) 11:02 Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>: > > On Tue, Oct 11, 2022 at 8:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > > > On Mon, Oct 10, 2022 at 11:33:57AM +0530, Bharath Rupireddy wrote: > > > On Mon, Oct 10, 2022 at 3:17 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > >> I wonder if it would be better to simply remove this extra polling of > > >> pg_wal as a prerequisite to your patch. The existing commentary leads me > > >> to think there might not be a strong reason for this behavior, so it could > > >> be a nice way to simplify your patch. > > > > > > I don't think it's a good idea to remove that completely. As said > > > above, it might help someone, we never know. > > > > It would be great to hear whether anyone is using this functionality. If > > no one is aware of existing usage and there is no interest in keeping it > > around, I don't think it would be unreasonable to remove it in v16. > > It seems like exhausting all the WAL in pg_wal before switching to > streaming after failing to fetch from archive is unremovable. I found > this after experimenting with it, here are my findings: > 1. The standby has to recover initial WAL files in the pg_wal > directory even for the normal post-restart/first-time-start case, I > mean, in non-crash recovery case. > 2. The standby received WAL files from primary (walreceiver just > writes and flushes the received WAL to WAL files under pg_wal) > pretty-fast and/or standby recovery is slow, say both the standby > connection to primary and archive connection are broken for whatever > reasons, then it has WAL files to recover in pg_wal directory. > > I think the fundamental behaviour for the standy is that it has to > fully recover to the end of WAL under pg_wal no matter who copies WAL > files there. I fully understand the consequences of manually copying > WAL files into pg_wal, for that matter, manually copying/tinkering any > other files into/under the data directory is something we don't > recommend and encourage. > > In summary, the standby state machine in WaitForWALToBecomeAvailable() > exhausts all the WAL in pg_wal before switching to streaming after > failing to fetch from archive. The v8 patch proposed upthread deviates > from this behaviour. Hence, attaching v9 patch that keeps the > behaviour as-is, that means, the standby exhausts all the WAL in > pg_wal before switching to streaming after fetching WAL from archive > for at least streaming_replication_retry_interval milliseconds. > > Please review the v9 patch further. Thanks for the updated patch. While reviewing the patch backlog, we have determined that this patch adds one or more TAP tests but has not added the test to the "meson.build" file. To do this, locate the relevant "meson.build" file for each test and add it in the 'tests' dictionary, which will look something like this: 'tap': { 'tests': [ 't/001_basic.pl', ], }, For some additional details please see this Wiki article: https://wiki.postgresql.org/wiki/Meson_for_patch_authors For more information on the meson build system for PostgreSQL see: https://wiki.postgresql.org/wiki/Meson Regards Ian Barwick
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Wed, Nov 16, 2022 at 9:38 AM Ian Lawrence Barwick <barwick@gmail.com> wrote: > > While reviewing the patch backlog, we have determined that this patch adds > one or more TAP tests but has not added the test to the "meson.build" file. Thanks for pointing it out. Yeah, the test wasn't picking up on meson builds. I added the new test file name in src/test/recovery/meson.build. I'm attaching the v10 patch for further review. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Nathan Bossart
Date:
On Tue, Oct 18, 2022 at 07:31:33AM +0530, Bharath Rupireddy wrote: > In summary, the standby state machine in WaitForWALToBecomeAvailable() > exhausts all the WAL in pg_wal before switching to streaming after > failing to fetch from archive. The v8 patch proposed upthread deviates > from this behaviour. Hence, attaching v9 patch that keeps the > behaviour as-is, that means, the standby exhausts all the WAL in > pg_wal before switching to streaming after fetching WAL from archive > for at least streaming_replication_retry_interval milliseconds. I think this is okay. The following comment explains why archives are preferred over existing files in pg_wal: * When doing archive recovery, we always prefer an archived log file even * if a file of the same name exists in XLOGDIR. The reason is that the * file in XLOGDIR could be an old, un-filled or partly-filled version * that was copied and restored as part of backing up $PGDATA. With your patch, we might replay one of these "old" files in pg_wal instead of the complete version of the file from the archives, but I think that is still correct. We'll just replay whatever exists in pg_wal (which may be un-filled or partly-filled) before attempting streaming. If that fails, we'll go back to trying the archives again. Would you mind testing this scenario? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Thu, Jan 12, 2023 at 6:21 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Tue, Oct 18, 2022 at 07:31:33AM +0530, Bharath Rupireddy wrote: > > In summary, the standby state machine in WaitForWALToBecomeAvailable() > > exhausts all the WAL in pg_wal before switching to streaming after > > failing to fetch from archive. The v8 patch proposed upthread deviates > > from this behaviour. Hence, attaching v9 patch that keeps the > > behaviour as-is, that means, the standby exhausts all the WAL in > > pg_wal before switching to streaming after fetching WAL from archive > > for at least streaming_replication_retry_interval milliseconds. > > I think this is okay. The following comment explains why archives are > preferred over existing files in pg_wal: > > * When doing archive recovery, we always prefer an archived log file even > * if a file of the same name exists in XLOGDIR. The reason is that the > * file in XLOGDIR could be an old, un-filled or partly-filled version > * that was copied and restored as part of backing up $PGDATA. > > With your patch, we might replay one of these "old" files in pg_wal instead > of the complete version of the file from the archives, That's true even today, without the patch, no? We're not changing the existing behaviour of the state machine. Can you explain how it happens with the patch? On HEAD, after failing to read from the archive, exhaust all wal from pg_wal and then switch to streaming mode. With the patch, after reading from the archive for at least streaming_replication_retry_interval milliseconds, exhaust all wal from pg_wal and then switch to streaming mode. > but I think that is > still correct. We'll just replay whatever exists in pg_wal (which may be > un-filled or partly-filled) before attempting streaming. If that fails, > we'll go back to trying the archives again. > > Would you mind testing this scenario? How about something like below for testing the above scenario? If it looks okay, I can add it as a new TAP test file. 1. Generate WAL files f1 and f2 and archive them. 2. Check the replay lsn and WAL file name on the standby, when it replays upto f2, stop the standby. 3. Set recovery to fail on the standby, and stop the standby. 4. Generate f3, f4 (partially filled) on the primary. 5. Manually copy f3, f4 to the standby's pg_wal. 6. Start the standby, since recovery is set to fail, and there're new WAL files (f3, f4) under its pg_wal, it must replay those WAL files (check the replay lsn and WAL file name, it must be f4) before switching to streaming. 7. Generate f5 on the primary. 8. The standby should receive f5 and replay it (check the replay lsn and WAL file name, it must be f5). 9. Set streaming to fail on the standby and set recovery to succeed. 10. Generate f6 on the primary. 11. The standby should receive f6 via archive and replay it (check the replay lsn and WAL file name, it must be f6). If needed, we can look out for these messages to confirm it works as expected: elog(DEBUG2, "switched WAL source from %s to %s after %s", xlogSourceNames[oldSource], xlogSourceNames[currentSource], lastSourceFailed ? "failure" : "success"); ereport(LOG, (errmsg("restored log file \"%s\" from archive", xlogfname))); Essentially, it covers what the documentation https://www.postgresql.org/docs/devel/warm-standby.html says: "In standby mode, the server continuously applies WAL received from the primary server. The standby server can read WAL from a WAL archive (see restore_command) or directly from the primary over a TCP connection (streaming replication). The standby server will also attempt to restore any WAL found in the standby cluster's pg_wal directory. That typically happens after a server restart, when the standby replays again WAL that was streamed from the primary before the restart, but you can also manually copy files to pg_wal at any time to have them replayed." Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Nathan Bossart
Date:
On Tue, Jan 17, 2023 at 07:44:52PM +0530, Bharath Rupireddy wrote: > On Thu, Jan 12, 2023 at 6:21 AM Nathan Bossart <nathandbossart@gmail.com> wrote: >> With your patch, we might replay one of these "old" files in pg_wal instead >> of the complete version of the file from the archives, > > That's true even today, without the patch, no? We're not changing the > existing behaviour of the state machine. Can you explain how it > happens with the patch? My point is that on HEAD, we will always prefer a complete archive file. With your patch, we might instead choose to replay an old file in pg_wal because we are artificially advancing the state machine. IOW even if there's a complete archive available, we might not use it. This is a behavior change, but I think it is okay. >> Would you mind testing this scenario? > > How about something like below for testing the above scenario? If it > looks okay, I can add it as a new TAP test file. > > 1. Generate WAL files f1 and f2 and archive them. > 2. Check the replay lsn and WAL file name on the standby, when it > replays upto f2, stop the standby. > 3. Set recovery to fail on the standby, and stop the standby. > 4. Generate f3, f4 (partially filled) on the primary. > 5. Manually copy f3, f4 to the standby's pg_wal. > 6. Start the standby, since recovery is set to fail, and there're new > WAL files (f3, f4) under its pg_wal, it must replay those WAL files > (check the replay lsn and WAL file name, it must be f4) before > switching to streaming. > 7. Generate f5 on the primary. > 8. The standby should receive f5 and replay it (check the replay lsn > and WAL file name, it must be f5). > 9. Set streaming to fail on the standby and set recovery to succeed. > 10. Generate f6 on the primary. > 11. The standby should receive f6 via archive and replay it (check the > replay lsn and WAL file name, it must be f6). I meant testing the scenario where there's an old file in pg_wal, a complete file in the archives, and your new GUC forces replay of the former. This might be difficult to do in a TAP test. Ultimately, I just want to validate the assumptions discussed above. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Thu, Jan 19, 2023 at 6:20 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Tue, Jan 17, 2023 at 07:44:52PM +0530, Bharath Rupireddy wrote: > > On Thu, Jan 12, 2023 at 6:21 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > >> With your patch, we might replay one of these "old" files in pg_wal instead > >> of the complete version of the file from the archives, > > > > That's true even today, without the patch, no? We're not changing the > > existing behaviour of the state machine. Can you explain how it > > happens with the patch? > > My point is that on HEAD, we will always prefer a complete archive file. > With your patch, we might instead choose to replay an old file in pg_wal > because we are artificially advancing the state machine. IOW even if > there's a complete archive available, we might not use it. This is a > behavior change, but I think it is okay. Oh, yeah, I too agree that it's okay because manually copying WAL files directly to pg_wal (which eventually get replayed before switching to streaming) isn't recommended anyway for production level servers. I think, we covered it in the documentation that it exhausts all the WAL present in pg_wal before switching. Isn't that enough? + Specifies amount of time after which standby attempts to switch WAL + source from WAL archive to streaming replication (get WAL from + primary). However, exhaust all the WAL present in pg_wal before + switching. If the standby fails to switch to stream mode, it falls + back to archive mode. > >> Would you mind testing this scenario? ndby should receive f6 via archive and replay it (check the > > replay lsn an> > > > I meant testing the scenario where there's an old file in pg_wal, a > complete file in the archives, and your new GUC forces replay of the > former. This might be difficult to do in a TAP test. Ultimately, I just > want to validate the assumptions discussed above. I think testing the scenario [1] is achievable. I could write a TAP test for it - https://github.com/BRupireddy/postgres/tree/prefer_archived_wal_v1. It's a bit flaky and needs a little more work (1 - writing a custom script for restore_command that sleeps only after fetching an existing WAL file from archive, not sleeping for a history file or a non-existent WAL file. 2- finding a command-line way to sleep on Windows.) to stabilize it, but it seems doable. I can spend some more time, if one thinks that the test is worth adding to the core, perhaps discussing it separately from this thread. [1] RestoreArchivedFile(): /* * When doing archive recovery, we always prefer an archived log file even * if a file of the same name exists in XLOGDIR. The reason is that the * file in XLOGDIR could be an old, un-filled or partly-filled version * that was copied and restored as part of backing up $PGDATA. * -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Wed, Nov 16, 2022 at 11:39 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > I'm attaching the v10 patch for further review. Needed a rebase. I'm attaching the v11 patch for further review. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Fri, Feb 24, 2023 at 10:26 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Nov 16, 2022 at 11:39 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > I'm attaching the v10 patch for further review. > > Needed a rebase. I'm attaching the v11 patch for further review. Needed a rebase, so attaching the v12 patch. I word-smithed comments and docs a bit. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Tue, Apr 25, 2023 at 9:27 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > > Needed a rebase. I'm attaching the v11 patch for further review. > > Needed a rebase, so attaching the v12 patch. I word-smithed comments > and docs a bit. Needed a rebase. I'm attaching the v13 patch for further consideration. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Fri, Jul 21, 2023 at 12:38 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Needed a rebase. I'm attaching the v13 patch for further consideration. Needed a rebase. I'm attaching the v14 patch. It also has the following changes: - Ran pgindent on the new source code. - Ran pgperltidy on the new TAP test. - Improved the newly added TAP test a bit. Used the new wait_for_log core TAP function in place of custom find_in_log. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Sat, Oct 21, 2023 at 11:59 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Fri, Jul 21, 2023 at 12:38 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > Needed a rebase. I'm attaching the v13 patch for further consideration. > > Needed a rebase. I'm attaching the v14 patch. It also has the following changes: > > - Ran pgindent on the new source code. > - Ran pgperltidy on the new TAP test. > - Improved the newly added TAP test a bit. Used the new wait_for_log > core TAP function in place of custom find_in_log. > > Thoughts? I took a closer look at v14 and came up with the following changes: 1. Used advance_wal introduced by commit c161ab74f7. 2. Simplified the core logic and new TAP tests. 3. Reworded the comments and docs. 4. Simplified new DEBUG messages. I've attached the v15 patch for further review. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Thu, Dec 28, 2023 at 5:26 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > I took a closer look at v14 and came up with the following changes: > > 1. Used advance_wal introduced by commit c161ab74f7. > 2. Simplified the core logic and new TAP tests. > 3. Reworded the comments and docs. > 4. Simplified new DEBUG messages. > > I've attached the v15 patch for further review. Per a recent commit c538592, FATAL-ized perl warnings in the newly added TAP test and attached the v16 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Wed, Jan 3, 2024 at 4:58 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, Dec 28, 2023 at 5:26 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > I took a closer look at v14 and came up with the following changes: > > > > 1. Used advance_wal introduced by commit c161ab74f7. > > 2. Simplified the core logic and new TAP tests. > > 3. Reworded the comments and docs. > > 4. Simplified new DEBUG messages. > > > > I've attached the v15 patch for further review. > > Per a recent commit c538592, FATAL-ized perl warnings in the newly > added TAP test and attached the v16 patch. Needed a rebase due to commit 776621a (conflict in src/test/recovery/meson.build for new TAP test file added). Please find the attached v17 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Wed, Jan 31, 2024 at 6:30 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Needed a rebase due to commit 776621a (conflict in > src/test/recovery/meson.build for new TAP test file added). Please > find the attached v17 patch. Strengthened tests a bit by using recovery_min_apply_delay to mimic standby spending some time fetching from archive. PSA v18 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, 19 Feb 2024 at 18:36, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > On Wed, Jan 31, 2024 at 6:30 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: >> >> Needed a rebase due to commit 776621a (conflict in >> src/test/recovery/meson.build for new TAP test file added). Please >> find the attached v17 patch. > > Strengthened tests a bit by using recovery_min_apply_delay to mimic > standby spending some time fetching from archive. PSA v18 patch. Here are some minor comments: [1] + primary). However, the standby exhausts all the WAL present in pg_wal s|pg_wal|<filename>pg_wal</filename>|g [2] +# Ensure checkpoint doesn't come in our way +$primary->append_conf('postgresql.conf', qq( + min_wal_size = 2MB + max_wal_size = 1GB + checkpoint_timeout = 1h + autovacuum = off +)); Keeping the same indentation might be better.
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Mon, Feb 19, 2024 at 8:25 PM Japin Li <japinli@hotmail.com> wrote: > > > Strengthened tests a bit by using recovery_min_apply_delay to mimic > > standby spending some time fetching from archive. PSA v18 patch. > > Here are some minor comments: Thanks for taking a look at it. > [1] > + primary). However, the standby exhausts all the WAL present in pg_wal > > s|pg_wal|<filename>pg_wal</filename>|g Done. > [2] > +# Ensure checkpoint doesn't come in our way > +$primary->append_conf('postgresql.conf', qq( > + min_wal_size = 2MB > + max_wal_size = 1GB > + checkpoint_timeout = 1h > + autovacuum = off > +)); > > Keeping the same indentation might be better. The autovacuum line looks mis-indented in the patch file. However, I now ran src/tools/pgindent/perltidyrc src/test/recovery/t/041_wal_source_switch.pl on it. Please see the attached v19 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, 20 Feb 2024 at 13:40, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > On Mon, Feb 19, 2024 at 8:25 PM Japin Li <japinli@hotmail.com> wrote: >> [2] >> +# Ensure checkpoint doesn't come in our way >> +$primary->append_conf('postgresql.conf', qq( >> + min_wal_size = 2MB >> + max_wal_size = 1GB >> + checkpoint_timeout = 1h >> + autovacuum = off >> +)); >> >> Keeping the same indentation might be better. > > The autovacuum line looks mis-indented in the patch file. However, I > now ran src/tools/pgindent/perltidyrc > src/test/recovery/t/041_wal_source_switch.pl on it. > Thanks for updating the patch. It seems still with the wrong indent. diff --git a/src/test/recovery/t/041_wal_source_switch.pl b/src/test/recovery/t/041_wal_source_switch.pl index 082680bf4a..b5eddba1d5 100644 --- a/src/test/recovery/t/041_wal_source_switch.pl +++ b/src/test/recovery/t/041_wal_source_switch.pl @@ -18,9 +18,9 @@ $primary->init( # Ensure checkpoint doesn't come in our way $primary->append_conf( 'postgresql.conf', qq( - min_wal_size = 2MB - max_wal_size = 1GB - checkpoint_timeout = 1h + min_wal_size = 2MB + max_wal_size = 1GB + checkpoint_timeout = 1h autovacuum = off )); $primary->start; @@ -85,7 +85,7 @@ my $offset = -s $standby->logfile; my $apply_delay = $retry_interval * 5; $standby->append_conf( 'postgresql.conf', qq( -recovery_min_apply_delay = '${apply_delay}ms' + recovery_min_apply_delay = '${apply_delay}ms' )); $standby->start;
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Tue, Feb 20, 2024 at 11:54 AM Japin Li <japinli@hotmail.com> wrote: > > > On Tue, 20 Feb 2024 at 13:40, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Feb 19, 2024 at 8:25 PM Japin Li <japinli@hotmail.com> wrote: > >> [2] > >> +# Ensure checkpoint doesn't come in our way > >> +$primary->append_conf('postgresql.conf', qq( > >> + min_wal_size = 2MB > >> + max_wal_size = 1GB > >> + checkpoint_timeout = 1h > >> + autovacuum = off > >> +)); > >> > >> Keeping the same indentation might be better. > > > > The autovacuum line looks mis-indented in the patch file. However, I > > now ran src/tools/pgindent/perltidyrc > > src/test/recovery/t/041_wal_source_switch.pl on it. > > > > Thanks for updating the patch. It seems still with the wrong indent. Thanks. perltidyrc didn't complain about anything on v19. However, I kept the alignment same as other TAP tests for multi-line append_conf. If that's not correct, I'll leave it to the committer to decide. PSA v20 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Nathan Bossart
Date:
cfbot claims that this one needs another rebase. I've spent some time thinking about this one. I'll admit I'm a bit worried about adding more complexity to this state machine, but I also haven't thought of any other viable approaches, and this still seems like a useful feature. So, for now, I think we should continue with the current approach. + fails to switch to stream mode, it falls back to archive mode. If this + parameter value is specified without units, it is taken as + milliseconds. Default is <literal>5min</literal>. With a lower value Does this really need to be milliseconds? I would think that any reasonable setting would at least on the order of seconds. + attempts. To avoid this, it is recommended to set a reasonable value. I think we might want to suggest what a "reasonable value" is. + static bool canSwitchSource = false; + bool switchSource = false; IIUC "canSwitchSource" indicates that we are trying to force a switch to streaming, but we are currently exhausting anything that's present in the pg_wal directory, while "switchSource" indicates that we should force a switch to streaming right now. Furthermore, "canSwitchSource" is static while "switchSource" is not. Is there any way to simplify this? For example, would it be possible to make an enum that tracks the streaming_replication_retry_interval state? /* * Don't allow any retry loops to occur during nonblocking - * readahead. Let the caller process everything that has been - * decoded already first. + * readahead if we failed to read from the current source. Let the + * caller process everything that has been decoded already first. */ - if (nonblocking) + if (nonblocking && lastSourceFailed) return XLREAD_WOULDBLOCK; Why do we skip this when "switchSource" is set? + /* Reset the WAL source switch state */ + if (switchSource) + { + Assert(canSwitchSource); + Assert(currentSource == XLOG_FROM_STREAM); + Assert(oldSource == XLOG_FROM_ARCHIVE); + switchSource = false; + canSwitchSource = false; + } How do we know that oldSource is guaranteed to be XLOG_FROM_ARCHIVE? Is there no way it could be XLOG_FROM_PG_WAL? +#streaming_replication_retry_interval = 5min # time after which standby + # attempts to switch WAL source from archive to + # streaming replication + # in milliseconds; 0 disables I think we might want to turn this feature off by default, at least for the first release. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Tue, Mar 5, 2024 at 7:34 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > cfbot claims that this one needs another rebase. Yeah, the conflict was with the new TAP test file name in src/test/recovery/meson.build. > I've spent some time thinking about this one. I'll admit I'm a bit worried > about adding more complexity to this state machine, but I also haven't > thought of any other viable approaches, Right. I understand that the WaitForWALToBecomeAvailable()'s state machine is a complex piece. > and this still seems like a useful > feature. So, for now, I think we should continue with the current > approach. Yes, the feature is useful like mentioned in the docs as below: + Reading WAL from archive may not always be as efficient and fast as + reading from primary. This can be due to the differences in disk types, + IO costs, network latencies etc. All of these can impact the recovery + performance on standby, and can increase the replication lag on + primary. In addition, the primary keeps accumulating WAL needed for the + standby while the standby reads WAL from archive, because the standby + replication slot stays inactive. To avoid these problems, one can use + this parameter to make standby switch to stream mode sooner. > + fails to switch to stream mode, it falls back to archive mode. If this > + parameter value is specified without units, it is taken as > + milliseconds. Default is <literal>5min</literal>. With a lower value > > Does this really need to be milliseconds? I would think that any > reasonable setting would at least on the order of seconds. Agreed. Done that way. > + attempts. To avoid this, it is recommended to set a reasonable value. > > I think we might want to suggest what a "reasonable value" is. It really depends on the WAL generation rate on the primary. If the WAL files grow faster, the disk runs out of space sooner, so setting a value to make frequent WAL source switch attempts can help. It's hard to suggest a one-size-fits-all value. Therefore, I've tweaked the docs a bit to reflect the fact that it depends on the WAL generation rate. > + static bool canSwitchSource = false; > + bool switchSource = false; > > IIUC "canSwitchSource" indicates that we are trying to force a switch to > streaming, but we are currently exhausting anything that's present in the > pg_wal directory, Right. > while "switchSource" indicates that we should force a > switch to streaming right now. It's not indicating force switch, it says "previously I was asked to switch source via canSwitchSource, now that I've exhausted all the WAL from the pg_wal directory, I'll make a source switch attempt now". > Furthermore, "canSwitchSource" is static > while "switchSource" is not. This is because the WaitForWALToBecomeAvailable() has to remember the decision (that streaming_replication_retry_interval has occurred) across the calls. And, switchSource is decided within WaitForWALToBecomeAvailable() for every function call. > Is there any way to simplify this? For > example, would it be possible to make an enum that tracks the > streaming_replication_retry_interval state? I guess the way it is right now looks simple IMHO. If the suggestion is to have an enum like below; it looks overkill for just two states. typedef enum { CAN_SWITCH_SOURCE, SWITCH_SOURCE } XLogSourceSwitchState; > /* > * Don't allow any retry loops to occur during nonblocking > - * readahead. Let the caller process everything that has been > - * decoded already first. > + * readahead if we failed to read from the current source. Let the > + * caller process everything that has been decoded already first. > */ > - if (nonblocking) > + if (nonblocking && lastSourceFailed) > return XLREAD_WOULDBLOCK; > > Why do we skip this when "switchSource" is set? It was leftover from the initial version of the patch - I was then encountering some issue and had that piece there. Removed it now. > + /* Reset the WAL source switch state */ > + if (switchSource) > + { > + Assert(canSwitchSource); > + Assert(currentSource == XLOG_FROM_STREAM); > + Assert(oldSource == XLOG_FROM_ARCHIVE); > + switchSource = false; > + canSwitchSource = false; > + } > > How do we know that oldSource is guaranteed to be XLOG_FROM_ARCHIVE? Is > there no way it could be XLOG_FROM_PG_WAL? No. switchSource is set to true only when canSwitchSource is set to true, which happens only when currentSource is XLOG_FROM_ARCHIVE (see SwitchWALSourceToPrimary()). > +#streaming_replication_retry_interval = 5min # time after which standby > + # attempts to switch WAL source from archive to > + # streaming replication > + # in milliseconds; 0 disables > > I think we might want to turn this feature off by default, at least for the > first release. Agreed. Done that way. Please see the attached v21 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Nathan Bossart
Date:
On Tue, Mar 05, 2024 at 11:38:37PM +0530, Bharath Rupireddy wrote: > On Tue, Mar 5, 2024 at 7:34 AM Nathan Bossart <nathandbossart@gmail.com> wrote: >> Is there any way to simplify this? For >> example, would it be possible to make an enum that tracks the >> streaming_replication_retry_interval state? > > I guess the way it is right now looks simple IMHO. If the suggestion > is to have an enum like below; it looks overkill for just two states. > > typedef enum > { > CAN_SWITCH_SOURCE, > SWITCH_SOURCE > } XLogSourceSwitchState; I was thinking of something more like typedef enum { NO_FORCE_SWITCH_TO_STREAMING, /* no switch necessary */ FORCE_SWITCH_TO_STREAMING_PENDING, /* exhausting pg_wal */ FORCE_SWITCH_TO_STREAMING, /* switch to streaming now */ } WALSourceSwitchState; At least, that illustrates my mental model of the process here. IMHO that's easier to follow than two similarly-named bool variables. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Wed, Mar 6, 2024 at 1:22 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > I was thinking of something more like > > typedef enum > { > NO_FORCE_SWITCH_TO_STREAMING, /* no switch necessary */ > FORCE_SWITCH_TO_STREAMING_PENDING, /* exhausting pg_wal */ > FORCE_SWITCH_TO_STREAMING, /* switch to streaming now */ > } WALSourceSwitchState; > > At least, that illustrates my mental model of the process here. IMHO > that's easier to follow than two similarly-named bool variables. I played with that idea and it came out very nice. Please see the attached v22 patch. Note that personally I didn't like "FORCE" being there in the names, so I've simplified them a bit. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Nathan Bossart
Date:
On Wed, Mar 06, 2024 at 10:02:43AM +0530, Bharath Rupireddy wrote: > On Wed, Mar 6, 2024 at 1:22 AM Nathan Bossart <nathandbossart@gmail.com> wrote: >> I was thinking of something more like >> >> typedef enum >> { >> NO_FORCE_SWITCH_TO_STREAMING, /* no switch necessary */ >> FORCE_SWITCH_TO_STREAMING_PENDING, /* exhausting pg_wal */ >> FORCE_SWITCH_TO_STREAMING, /* switch to streaming now */ >> } WALSourceSwitchState; >> >> At least, that illustrates my mental model of the process here. IMHO >> that's easier to follow than two similarly-named bool variables. > > I played with that idea and it came out very nice. Please see the > attached v22 patch. Note that personally I didn't like "FORCE" being > there in the names, so I've simplified them a bit. Thanks. I'd like to spend some time testing this, but from a glance, the code appears to be in decent shape. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Wed, Mar 6, 2024 at 9:49 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > > I played with that idea and it came out very nice. Please see the > > attached v22 patch. Note that personally I didn't like "FORCE" being > > there in the names, so I've simplified them a bit. > > Thanks. I'd like to spend some time testing this, but from a glance, the > code appears to be in decent shape. Rebase needed after 071e3ad59d6fd2d6d1277b2bd9579397d10ded28 due to a conflict in meson.build. Please see the attached v23 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Michael Paquier
Date:
On Sun, Mar 17, 2024 at 11:37:58AM +0530, Bharath Rupireddy wrote: > Rebase needed after 071e3ad59d6fd2d6d1277b2bd9579397d10ded28 due to a > conflict in meson.build. Please see the attached v23 patch. I've been reading this patch, and this is a very tricky one. Please be *very* cautious. +#streaming_replication_retry_interval = 0 # time after which standby + # attempts to switch WAL source from archive to + # streaming replication in seconds; 0 disables This stuff allows a minimal retry interval of 1s. Could it be useful to have more responsiveness here and allow lower values than that? Why not switching the units to be milliseconds? + if (streaming_replication_retry_interval <= 0 || + !StandbyMode || + currentSource != XLOG_FROM_ARCHIVE) + return SWITCH_TO_STREAMING_NONE; Hmm. Perhaps this should mention why we don't care about the consistent point. + /* See if we can switch WAL source to streaming */ + if (wal_source_switch_state == SWITCH_TO_STREAMING_NONE) + wal_source_switch_state = SwitchWALSourceToPrimary(); Rather than a routine that returns as result the value to use for the GUC, I'd suggest to let this routine set the GUC as there is only one caller of SwitchWALSourceToPrimary(). This can also include a check on SWITCH_TO_STREAMING_NONE, based on what I'm reading that. - if (lastSourceFailed) + if (lastSourceFailed || + wal_source_switch_state == SWITCH_TO_STREAMING) Hmm. This one may be tricky. I'd recommend a separation between the failure in reading from a source and the switch to a new "forced" source. + if (wal_source_switch_state == SWITCH_TO_STREAMING_PENDING) + readFrom = XLOG_FROM_PG_WAL; + else + readFrom = currentSource == XLOG_FROM_ARCHIVE ? + XLOG_FROM_ANY : currentSource; WALSourceSwitchState looks confusing here, and are you sure that this is actualy correct? Shouldn't we still try a READ_FROM_ANY or a read from the archives even with a streaming pending. By the way, I am not convinced that what you have is the best interface ever. This assumes that we'd always want to switch to streaming more aggressively. Could there be a point in also controlling if we should switch to pg_wal/ or just to archiving more aggressively as well, aka be able to do the opposite switch of WAL source? This design looks somewhat limited to me. The origin of the issue is that we don't have a way to control the order of the sources consumed by WAL replay. Perhaps something like a replay_source_order that uses a list would be better, with elements settable to archive, pg_wal and streaming? -- Michael
Attachment
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
On Mon, Mar 18, 2024 at 11:38 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Sun, Mar 17, 2024 at 11:37:58AM +0530, Bharath Rupireddy wrote: > > Rebase needed after 071e3ad59d6fd2d6d1277b2bd9579397d10ded28 due to a > > conflict in meson.build. Please see the attached v23 patch. > > I've been reading this patch, and this is a very tricky one. Please > be *very* cautious. Thanks for looking into this. > +#streaming_replication_retry_interval = 0 # time after which standby > + # attempts to switch WAL source from archive to > + # streaming replication in seconds; 0 disables > > This stuff allows a minimal retry interval of 1s. Could it be useful > to have more responsiveness here and allow lower values than that? > Why not switching the units to be milliseconds? Nathan had a different view on this to have it on the order of seconds - https://www.postgresql.org/message-id/20240305020452.GA3373526%40nathanxps13. If set to a too low value, the frequency of standby trying to connect to primary increases. IMO, the order of seconds seems fine. > + if (streaming_replication_retry_interval <= 0 || > + !StandbyMode || > + currentSource != XLOG_FROM_ARCHIVE) > + return SWITCH_TO_STREAMING_NONE; > > Hmm. Perhaps this should mention why we don't care about the > consistent point. Are you asking why we don't care whether the standby reached a consistent point when switching to streaming mode due to this new parameter? If this is the ask, the same applies when a standby typically switches to streaming replication (get WAL from primary) today, that is when receive from WAL archive finishes (no more WAL left there) or fails for any reason. The standby doesn't care about the consistent point even today, it just trusts the WAL source and makes the switch. > + /* See if we can switch WAL source to streaming */ > + if (wal_source_switch_state == SWITCH_TO_STREAMING_NONE) > + wal_source_switch_state = SwitchWALSourceToPrimary(); > > Rather than a routine that returns as result the value to use for the > GUC, I'd suggest to let this routine set the GUC as there is only one > caller of SwitchWALSourceToPrimary(). This can also include a check > on SWITCH_TO_STREAMING_NONE, based on what I'm reading that. Firstly, wal_source_switch_state is not a GUC, it's a static variable to be used across WaitForWALToBecomeAvailable calls. And, if you are suggesting to turn SwitchWALSourceToPrimary so that it sets wal_source_switch_state directly, I'd not do that because when wal_source_switch_state is not SWITCH_TO_STREAMING_NONE, the function gets called unnecessarily. > - if (lastSourceFailed) > + if (lastSourceFailed || > + wal_source_switch_state == SWITCH_TO_STREAMING) > > Hmm. This one may be tricky. I'd recommend a separation between the > failure in reading from a source and the switch to a new "forced" > source. Separation would just add duplicate code. Moreover, the code wrapped within if (lastSourceFailed) doesn't do any error handling or such, it just resets a few stuff from the previous source and sets the next source. FWIW, please check [1] (and the discussion thereon) for how the lastSourceFailed flag is being used to consume all the streamed WAL in pg_wal directly upon detecting promotion trigger file. Therefore, I see no problem with the way it is right now for this new feature. [1] /* * Data not here yet. Check for trigger, then wait for * walreceiver to wake us up when new WAL arrives. */ if (CheckForStandbyTrigger()) { /* * Note that we don't return XLREAD_FAIL immediately * here. After being triggered, we still want to * replay all the WAL that was already streamed. It's * in pg_wal now, so we just treat this as a failure, * and the state machine will move on to replay the * streamed WAL from pg_wal, and then recheck the * trigger and exit replay. */ lastSourceFailed = true; > + if (wal_source_switch_state == SWITCH_TO_STREAMING_PENDING) > + readFrom = XLOG_FROM_PG_WAL; > + else > + readFrom = currentSource == XLOG_FROM_ARCHIVE ? > + XLOG_FROM_ANY : currentSource; > > WALSourceSwitchState looks confusing here, and are you sure that this > is actualy correct? Shouldn't we still try a READ_FROM_ANY or a read > from the archives even with a streaming pending. Please see the discussion starting from https://www.postgresql.org/message-id/20221008215221.GA894639%40nathanxps13. We wanted to keep the existing behaviour the same when we intentionally switch source to streaming from archive due to the timeout. The existing behaviour is to exhaust WAL in pg_wal before switching the source to streaming after failure to fetch from archive. When wal_source_switch_state is SWITCH_TO_STREAMING_PENDING, the currentSource is already XLOG_FROM_ARCHIVE (To clear the dust off here, I've added an assert now in the attached new v24 patch). And, we don't want to pass XLOG_FROM_ANY to XLogFileReadAnyTLI to again fetch from the archive. Hence, we choose readFrom = XLOG_FROM_PG_WAL to specifically tell XLogFileReadAnyTLI read from pg_wal directly. > By the way, I am not convinced that what you have is the best > interface ever. This assumes that we'd always want to switch to > streaming more aggressively. Could there be a point in also > controlling if we should switch to pg_wal/ or just to archiving more > aggressively as well, aka be able to do the opposite switch of WAL > source? This design looks somewhat limited to me. The origin of the > issue is that we don't have a way to control the order of the sources > consumed by WAL replay. Perhaps something like a replay_source_order > that uses a list would be better, with elements settable to archive, > pg_wal and streaming? Intention of this feature is to provide a way for the streaming standby to quickly detect when the primary is up and running without having to wait until either all the WAL in the archive is over or a failure to fetch from archive happens. Advantages of this feature are: 1) it can make the recovery a bit faster (if fetching from archive adds up costs with different storage types, IO costs and network delays), thus can reduce the replication lag 2) primary (if using replication slot based streaming replication setup) doesn't have to keep the required WAL for the standby for longer durations, thus reducing the risk of no space left on disk issues. IMHO, it makes sense to have something like replay_source_order if there's any use case that arises in future requiring the standby to intentionally switch to pg_wal or archive. But not as part of this feature. Please see the attached v24 patch. I've added an assertion that the current source is archive before calling XLogFileReadAnyTLI if wal_source_switch_state is SWITCH_TO_STREAMING_PENDING. I've also added the new enum WALSourceSwitchState to typedefs.list to make pgindent adjust it correctly. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
Hi, I took a brief look at the patch. For a motivation aspect I can see this being useful synchronous_replicas if you have commit set to flush mode. So +1 on feature, easier configurability, although thinking about it more you could probably have the restore script be smarter and provide non-zero exit codes periodically. The patch needs to be rebased but I tested this against an older 17 build. > + ereport(DEBUG1, > + errmsg_internal("switched WAL source from %s to %s after %s", > + xlogSourceNames[oldSource], Not sure if you're intentionally changing from DEBUG1 from DEBUG2. > * standby and increase the replication lag on primary. Do you mean "increase replication lag on standby"? nit: reading from archive *could* be faster since you could in theory it's not single-processed/threaded. > However, > + * exhaust all the WAL present in pg_wal before switching. If successful, > + * the state machine moves to XLOG_FROM_STREAM state, otherwise it falls > + * back to XLOG_FROM_ARCHIVE state. I think I'm missing how this happens. Or what "successful" means. If I'm reading it right, no matter what happens we will always move to XLOG_FROM_STREAM based on how the state machine works? I tested this in a basic RR setup without replication slots (e.g. log shipping) where the WAL is available in the archive but the primary always has the WAL rotated out and 'streaming_replication_retry_interval = 1'. This leads the RR to become stuck where it stops fetching from archive and loops between XLOG_FROM_PG_WAL and XLOG_FROM_STREAM. When 'streaming_replication_retry_interval' is breached, we transition from {currentSource, wal_source_switch_state} {XLOG_FROM_ARCHIVE, SWITCH_TO_STREAMING_NONE} -> {XLOG_FROM_ARCHIVE, SWITCH_TO_STREAMING_PENDING} with readFrom = XLOG_FROM_PG_WAL. That reads the last record successfully in pg_wal and then fails to read the next one because it doesn't exist, transitioning to {XLOG_FROM_STREAM, SWITCH_TO_STREAMING_PENDING}. XLOG_FROM_STREAM fails because the WAL is no longer there on primary, it sets it back to {XLOG_FROM_ARCHIVE, SWITCH_TO_STREAMING_PENDING}. > last_fail_time = now; > currentSource = XLOG_FROM_ARCHIVE; > break; Since the state is still SWITCH_TO_STREAMING_PENDING from the previous loops, it forces > Assert(currentSource == XLOG_FROM_ARCHIVE); > readFrom = XLOG_FROM_PG_WAL; > ... > readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2, readFrom); And this readFile call seems to always succeed since it can read the latest WAL record but not the next one, which is in archive, leading to transition back to XLOG_FROM_STREAMING and repeats. > /* > * Nope, not found in archive or pg_wal. > */ > lastSourceFailed = true; I don't think this gets triggered for XLOG_FROM_PG_WAL case, which means the safety check you added doesn't actually kick in. > if (wal_source_switch_state == SWITCH_TO_STREAMING_PENDING) > { > wal_source_switch_state = SWITCH_TO_STREAMING; > elog(LOG, "SWITCH_TO_STREAMING_PENDING TO SWITCH_TO_STREAMING"); > } Thanks -- John Hsu - Amazon Web Services
Re: Switching XLog source from archive to streaming when primary available
From
Bharath Rupireddy
Date:
Hi, Thanks for looking into this. On Fri, Aug 23, 2024 at 5:03 AM John H <johnhyvr@gmail.com> wrote: > > For a motivation aspect I can see this being useful > synchronous_replicas if you have commit set to flush mode. In synchronous replication setup, until standby finishes fetching WAL from the archive, the commits on the primary have to wait which can increase the query latency. If the standby can connect to the primary as soon as the broken connection is restored, it can fetch the WAL soon and transaction commits can continue on the primary. Is my understanding correct? Is there anything more to this? I talked to Michael Paquier at PGConf.Dev 2024 and got some concerns about this feature for dealing with changing timelines. I can't think of them right now. And, there were some cautions raised upthread - https://www.postgresql.org/message-id/20240305020452.GA3373526%40nathanxps13 and https://www.postgresql.org/message-id/ZffaQt7UbM2Q9kYh%40paquier.xyz. > So +1 on feature, easier configurability, although thinking about it > more you could probably have the restore script be smarter and provide > non-zero exit codes periodically. Interesting. Yes, the restore script has to be smarter to detect the broken connections and distinguish whether the server is performing just the archive recovery/PITR or streaming from standby. Not doing it right, perhaps, can cause data loss (?). > The patch needs to be rebased but I tested this against an older 17 build. Will rebase soon. > > + ereport(DEBUG1, > > + errmsg_internal("switched WAL source from %s to %s after %s", > > + xlogSourceNames[oldSource], > > Not sure if you're intentionally changing from DEBUG1 from DEBUG2. Will change. > > * standby and increase the replication lag on primary. > > Do you mean "increase replication lag on standby"? > nit: reading from archive *could* be faster since you could in theory > it's not single-processed/threaded. Yes. I think we can just say "All of these can impact the recovery performance on + * standby and increase the replication lag." > > However, > > + * exhaust all the WAL present in pg_wal before switching. If successful, > > + * the state machine moves to XLOG_FROM_STREAM state, otherwise it falls > > + * back to XLOG_FROM_ARCHIVE state. > > I think I'm missing how this happens. Or what "successful" means. If I'm reading > it right, no matter what happens we will always move to > XLOG_FROM_STREAM based on how > the state machine works? Please have a look at some discussion upthread on exhausting pg_wal before switching - https://www.postgresql.org/message-id/20230119005014.GA3838170%40nathanxps13. Even today, the standby exhausts pg_wal before switching to streaming from the archive. > I tested this in a basic RR setup without replication slots (e.g. log > shipping) where the > WAL is available in the archive but the primary always has the WAL > rotated out and > 'streaming_replication_retry_interval = 1'. This leads the RR to > become stuck where it stops fetching from > archive and loops between XLOG_FROM_PG_WAL and XLOG_FROM_STREAM. Nice catch. This is a problem. One idea is to disable streaming_replication_retry_interval feature for slot-less streaming replication - either when primary_slot_name isn't specified disallow the GUC to be set in assign_hook or when deciding to switch the wal source. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Thu, Aug 29, 2024 at 6:32 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > In synchronous replication setup, until standby finishes fetching WAL > from the archive, the commits on the primary have to wait which can > increase the query latency. If the standby can connect to the primary > as soon as the broken connection is restored, it can fetch the WAL > soon and transaction commits can continue on the primary. Is my > understanding correct? Is there anything more to this? > Yup, if you're running with synchronous_commit = 'on' with synchronous_replicas, then you can have the replica continue streaming changes into pg_wal faster than WAL replay so commits may be unblocked faster. > I talked to Michael Paquier at PGConf.Dev 2024 and got some concerns > about this feature for dealing with changing timelines. I can't think > of them right now. I'm not sure what the risk would be if the WAL/history files we sync from streaming is the same as we replay from archive. > And, there were some cautions raised upthread - > https://www.postgresql.org/message-id/20240305020452.GA3373526%40nathanxps13 > and https://www.postgresql.org/message-id/ZffaQt7UbM2Q9kYh%40paquier.xyz. Yup agreed. I need to understand this area a lot better before I can do a more in-depth review. > Interesting. Yes, the restore script has to be smarter to detect the > broken connections and distinguish whether the server is performing > just the archive recovery/PITR or streaming from standby. Not doing it > right, perhaps, can cause data loss (?). I don't think there would be data-loss, only replay is stuck/slows down. It wouldn't be any different today if the restore-script returned a non-zero exit status. The end-user could configure their restore-script to return a non-zero status, based on some condition, to move to streaming. > > > However, > > > + * exhaust all the WAL present in pg_wal before switching. If successful, > > > + * the state machine moves to XLOG_FROM_STREAM state, otherwise it falls > > > + * back to XLOG_FROM_ARCHIVE state. > > > > I think I'm missing how this happens. Or what "successful" means. If I'm reading > > it right, no matter what happens we will always move to > > XLOG_FROM_STREAM based on how > > the state machine works? > > Please have a look at some discussion upthread on exhausting pg_wal > before switching - > https://www.postgresql.org/message-id/20230119005014.GA3838170%40nathanxps13. > Even today, the standby exhausts pg_wal before switching to streaming > from the archive. > I'm getting caught on the word "successful". My rough understanding of WaitForWALToBecomeAvailable is that once you're in XLOG_FROM_PG_WAL, if it was unsuccessful for whatever reason, it will still transition to XLOG_FROM_STREAMING. It does not loop back to XLOG_FROM_ARCHIVE if XLOG_FROM_PG_WAL fails. > Nice catch. This is a problem. One idea is to disable > streaming_replication_retry_interval feature for slot-less streaming > replication - either when primary_slot_name isn't specified disallow > the GUC to be set in assign_hook or when deciding to switch the wal > source. Thoughts? I don't think it's dependent on slot-less streaming. You would also run into the issue if the WAL is no longer there on the primary, which can occur with 'max_slot_wal_keep_size' as well. IMO the guarantee we need to make is that when we transition from XLOG_FROM_STREAMING to XLOG_FROM_ARCHIVE for a "fresh start", we should attempt to restore from archive at least once. I think this means that wal_source_switch_state should be reset back to SWITCH_TO_STREAMING_NONE whenever we transition to XLOG_FROM_ARCHIVE. We've attempted the switch to streaming once, so let's not continually re-try if it failed. Thanks, -- John Hsu - Amazon Web Services