Thread: Synchronizing slots from primary to standby

Synchronizing slots from primary to standby

From

Peter Eisentraut

Date:

31 October 2021, 10:08:18

I want to reactivate $subject.  I took Petr Jelinek's patch from [0], 
rebased it, added a bit of testing.  It basically works, but as 
mentioned in [0], there are various issues to work out.

The idea is that the standby runs a background worker to periodically 
fetch replication slot information from the primary.  On failover, a 
logical subscriber would then ideally find up-to-date replication slots 
on the new publisher and can just continue normally.

The previous thread didn't have a lot of discussion, but I have gathered 
from off-line conversations that there is a wider agreement on this 
approach.  So the next steps would be to make it more robust and 
configurable and documented.  As I said, I added a small test case to 
show that it works at all, but I think a lot more tests should be added. 
  I have also found that this breaks some seemingly unrelated tests in 
the recovery test suite.  I have disabled these here.  I'm not sure if 
the patch actually breaks anything or if these are just differences in 
timing or implementation dependencies.  This patch adds a LIST_SLOTS 
replication command, but I think this could be replaced with just a 
SELECT FROM pg_replication_slots query now.  (This patch is originally 
older than when you could run SELECT queries over the replication protocol.)

So, again, this isn't anywhere near ready, but there is already a lot 
here to gather feedback about how it works, how it should work, how to 
configure it, and how it fits into an overall replication and HA 
architecture.


[0]: 
https://www.postgresql.org/message-id/flat/3095349b-44d4-bf11-1b33-7eefb585d578%402ndquadrant.com

Attachment

v1-0001-Synchronize-logical-replication-slots-from-primar.patch

Re: Synchronizing slots from primary to standby

From

Masahiko Sawada

Date:

24 November 2021, 06:11:51

On Sun, Oct 31, 2021 at 7:08 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> I want to reactivate $subject.  I took Petr Jelinek's patch from [0],
> rebased it, added a bit of testing.  It basically works, but as
> mentioned in [0], there are various issues to work out.

Thank you for working on this feature!

>
> The idea is that the standby runs a background worker to periodically
> fetch replication slot information from the primary.  On failover, a
> logical subscriber would then ideally find up-to-date replication slots
> on the new publisher and can just continue normally.
>
>   I have also found that this breaks some seemingly unrelated tests in
> the recovery test suite.  I have disabled these here.  I'm not sure if
> the patch actually breaks anything or if these are just differences in
> timing or implementation dependencies.

I haven’t looked at the patch deeply but regarding 007_sync_rep.pl,
the tests seem to fail since the tests rely on the order of the wal
sender array on the shared memory. Since a background worker for
synchronizing replication slots periodically connects to the walsender
on the primary and disconnects, it breaks the assumption of the order.
Regarding 010_logical_decoding_timelines.pl, I guess that the patch
breaks the test because the background worker for synchronizing slots
on the replica periodically advances the replica's slot. I think we
need to have a way to disable the slot synchronization or to specify
the slot name to sync with the primary. I'm not sure we already
discussed this topic but I think we need it at least for testing
purposes.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Synchronizing slots from primary to standby

From

Dimitri Fontaine

Date:

24 November 2021, 16:25:47

Hi all,

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> I want to reactivate $subject.  I took Petr Jelinek's patch from [0],
> rebased it, added a bit of testing.  It basically works, but as
> mentioned in [0], there are various issues to work out.

Thanks for working on that topic, I believe it's an important part of
Postgres' HA story.

> The idea is that the standby runs a background worker to periodically fetch
> replication slot information from the primary.  On failover, a logical
> subscriber would then ideally find up-to-date replication slots on the new
> publisher and can just continue normally.

Is there a case to be made about doing the same thing for physical
replication slots too?

That's what pg_auto_failover [1] does by default: it creates replication
slots on every node for every other node, in a way that a standby
Postgres instance now maintains a replication slot for the primary. This
ensures that after a promotion, the standby knows to retain any and all
WAL segments that the primary might need when rejoining, at pg_rewind
time.

> The previous thread didn't have a lot of discussion, but I have gathered
> from off-line conversations that there is a wider agreement on this
> approach.  So the next steps would be to make it more robust and
> configurable and documented.

I suppose part of the configuration would then include taking care of
physical slots. Some people might want to turn that off and use the
Postgres 13+ ability to use the remote primary restore_command to fetch
missing WAL files, instead. Well, people who have setup an archiving
system, anyway.

> As I said, I added a small test case to
> show that it works at all, but I think a lot more tests should be added.   I
> have also found that this breaks some seemingly unrelated tests in the
> recovery test suite.  I have disabled these here.  I'm not sure if the patch
> actually breaks anything or if these are just differences in timing or
> implementation dependencies.  This patch adds a LIST_SLOTS replication
> command, but I think this could be replaced with just a SELECT FROM
> pg_replication_slots query now.  (This patch is originally older than when
> you could run SELECT queries over the replication protocol.)

Given the admitted state of the patch, I didn't focus on tests. I could
successfully apply the patch on-top of current master's branch, and
cleanly compile and `make check`.

Then I also updated pg_auto_failover to support Postgres 15devel [2] so
that I could then `make NODES=3 cluster` there and play with the new
replication command:

  $ psql -d "port=5501 replication=1" -c "LIST_SLOTS;"
  psql:/Users/dim/.psqlrc:24: ERROR:  XX000: cannot execute SQL commands in WAL sender for physical replication
  LOCATION:  exec_replication_command, walsender.c:1830
  ...

I'm not too sure about this idea of running SQL in a replication
protocol connection that you're mentioning, but I suppose that's just me
needing to brush up on the topic.

> So, again, this isn't anywhere near ready, but there is already a lot here
> to gather feedback about how it works, how it should work, how to configure
> it, and how it fits into an overall replication and HA architecture.

Maybe the first question about configuration would be about selecting
which slots a standby should maintain from the primary. Is it all of the
slots that exists on both the nodes, or a sublist of that?

Is it possible to have a slot with the same name on a primary and a
standby node, in a way that the standby's slot would be a completely
separate entity from the primary's slot? If yes (I just don't know at
the moment), well then, should we continue to allow that?

Other aspects of the configuration might include a list of databases in
which to make the new background worker active, and the polling delay,
etc.

Also, do we want to even consider having the slot management on a
primary node depend on the ability to sync the advancing on one or more
standby nodes? I'm not sure to see that one as a good idea, but maybe we
want to kill it publically very early then ;-)

Regards,
--
dim

Author of “The Art of PostgreSQL”, see https://theartofpostgresql.com

[1]: https://github.com/citusdata/pg_auto_failover
[2]: https://github.com/citusdata/pg_auto_failover/pull/838

Re: Synchronizing slots from primary to standby

From

Bharath Rupireddy

Date:

28 November 2021, 06:52:45

On Sun, Oct 31, 2021 at 3:38 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> I want to reactivate $subject.  I took Petr Jelinek's patch from [0],
> rebased it, added a bit of testing.  It basically works, but as
> mentioned in [0], there are various issues to work out.
>
> The idea is that the standby runs a background worker to periodically
> fetch replication slot information from the primary.  On failover, a
> logical subscriber would then ideally find up-to-date replication slots
> on the new publisher and can just continue normally.
>
> The previous thread didn't have a lot of discussion, but I have gathered
> from off-line conversations that there is a wider agreement on this
> approach.  So the next steps would be to make it more robust and
> configurable and documented.  As I said, I added a small test case to
> show that it works at all, but I think a lot more tests should be added.
>   I have also found that this breaks some seemingly unrelated tests in
> the recovery test suite.  I have disabled these here.  I'm not sure if
> the patch actually breaks anything or if these are just differences in
> timing or implementation dependencies.  This patch adds a LIST_SLOTS
> replication command, but I think this could be replaced with just a
> SELECT FROM pg_replication_slots query now.  (This patch is originally
> older than when you could run SELECT queries over the replication protocol.)
>
> So, again, this isn't anywhere near ready, but there is already a lot
> here to gather feedback about how it works, how it should work, how to
> configure it, and how it fits into an overall replication and HA
> architecture.
>
>
> [0]:
> https://www.postgresql.org/message-id/flat/3095349b-44d4-bf11-1b33-7eefb585d578%402ndquadrant.com

Thanks for working on this patch. This feature will be useful as it
avoids manual intervention during the failover.

Here are some thoughts:
1) Instead of a new LIST_SLOT command, can't we use
READ_REPLICATION_SLOT (slight modifications needs to be done to make
it support logical replication slots and to get more information from
the subscriber).

2) How frequently the new bg worker is going to sync the slot info?
How can it ensure that the latest information exists say when the
subscriber is down/crashed before it picks up the latest slot
information?

3) Instead of the subscriber pulling the slot info, why can't the
publisher (via the walsender or a new bg worker maybe?) push the
latest slot info? I'm not sure we want to add more functionality to
the walsender, if yes, isn't it going to be much simpler?

4) IIUC, the proposal works only for logical replication slots but do
you also see the need for supporting some kind of synchronization of
physical replication slots as well? IMO, we need a better and
consistent way for both types of replication slots. If the walsender
can somehow push the slot info from the primary (for physical
replication slots)/publisher (for logical replication slots) to the
standby/subscribers, this will be a more consistent and simplistic
design. However, I'm not sure if this design is doable at all.

Regards,
Bharath Rupireddy.

Re: Synchronizing slots from primary to standby

From

SATYANARAYANA NARLAPURAM

Date:

28 November 2021, 20:17:46

3) Instead of the subscriber pulling the slot info, why can't the
publisher (via the walsender or a new bg worker maybe?) push the
latest slot info? I'm not sure we want to add more functionality to
the walsender, if yes, isn't it going to be much simpler?

Standby pulling the information or at least making a first attempt to connect to the primary is a better design as primary doesn't need to spend its cycles repeatedly connecting to an unreachable standby. In fact, primary wouldn't even need to know the followers, for example followers / log shipping standbys

Re: Synchronizing slots from primary to standby

From

Bharath Rupireddy

Date:

29 November 2021, 04:09:59

On Mon, Nov 29, 2021 at 1:48 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
>
>> 3) Instead of the subscriber pulling the slot info, why can't the
>> publisher (via the walsender or a new bg worker maybe?) push the
>> latest slot info? I'm not sure we want to add more functionality to
>> the walsender, if yes, isn't it going to be much simpler?
>
> Standby pulling the information or at least making a first attempt to connect to the  primary is a better design as
primarydoesn't need to spend its cycles repeatedly connecting to an unreachable standby. In fact, primary wouldn't even
needto know the followers, for example followers / log shipping standbys 

My idea was to let the existing walsender from the primary/publisher
to send the slot info (both logical and physical replication slots) to
the standby/subscriber, probably by piggybacking the slot info with
the WAL currently it sends. Having said that, I don't know the
feasibility of it. Anyways, I'm not in favour of having a new bg
worker to just ship the slot info. The standby/subscriber, while
making connection to primary/publisher, can choose to get the
replication slot info.

As I said upthread, the problem I see with standby/subscriber pulling
the info is that: how frequently the standby/subscriber is going to
sync the slot info from primary/publisher? How can it ensure that the
latest information exists say when the subscriber is down/crashed
before it picks up the latest slot information?

IIUC, the initial idea proposed in this patch deals with only logical
replication slots not the physical replication slots, what I'm
thinking is to have a generic way to deal with both of them.

Note: In the above description, I used primary-standby and
publisher-subscriber to represent the physical and logical replication
slots respectively.

Regards,
Bharath Rupireddy.

Re: Synchronizing slots from primary to standby

From

Dilip Kumar

Date:

29 November 2021, 05:44:23

On Mon, Nov 29, 2021 at 9:40 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Nov 29, 2021 at 1:48 AM SATYANARAYANA NARLAPURAM
> <satyanarlapuram@gmail.com> wrote:
> >
> >> 3) Instead of the subscriber pulling the slot info, why can't the
> >> publisher (via the walsender or a new bg worker maybe?) push the
> >> latest slot info? I'm not sure we want to add more functionality to
> >> the walsender, if yes, isn't it going to be much simpler?
> >
> > Standby pulling the information or at least making a first attempt to connect to the  primary is a better design as
primarydoesn't need to spend its cycles repeatedly connecting to an unreachable standby. In fact, primary wouldn't even
needto know the followers, for example followers / log shipping standbys 
>
> My idea was to let the existing walsender from the primary/publisher
> to send the slot info (both logical and physical replication slots) to
> the standby/subscriber, probably by piggybacking the slot info with
> the WAL currently it sends. Having said that, I don't know the
> feasibility of it. Anyways, I'm not in favour of having a new bg
> worker to just ship the slot info. The standby/subscriber, while
> making connection to primary/publisher, can choose to get the
> replication slot info.

I think it is possible that the standby is restoring the WAL directly
from the archive location and there might not be any wal sender at
time. So I think the idea of standby pulling the WAL looks better to
me.

> As I said upthread, the problem I see with standby/subscriber pulling
> the info is that: how frequently the standby/subscriber is going to
> sync the slot info from primary/publisher? How can it ensure that the
> latest information exists say when the subscriber is down/crashed
> before it picks up the latest slot information?

Yeah that is a good question that how frequently the subscriber should
fetch the slot information, I think that should be configurable
values.  And the time delay is more, the chances of losing the latest
slot is more.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Synchronizing slots from primary to standby

From

Bharath Rupireddy

Date:

29 November 2021, 06:49:00

On Mon, Nov 29, 2021 at 11:14 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Nov 29, 2021 at 9:40 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Mon, Nov 29, 2021 at 1:48 AM SATYANARAYANA NARLAPURAM
> > <satyanarlapuram@gmail.com> wrote:
> > >
> > >> 3) Instead of the subscriber pulling the slot info, why can't the
> > >> publisher (via the walsender or a new bg worker maybe?) push the
> > >> latest slot info? I'm not sure we want to add more functionality to
> > >> the walsender, if yes, isn't it going to be much simpler?
> > >
> > > Standby pulling the information or at least making a first attempt to connect to the  primary is a better design
asprimary doesn't need to spend its cycles repeatedly connecting to an unreachable standby. In fact, primary wouldn't
evenneed to know the followers, for example followers / log shipping standbys 
> >
> > My idea was to let the existing walsender from the primary/publisher
> > to send the slot info (both logical and physical replication slots) to
> > the standby/subscriber, probably by piggybacking the slot info with
> > the WAL currently it sends. Having said that, I don't know the
> > feasibility of it. Anyways, I'm not in favour of having a new bg
> > worker to just ship the slot info. The standby/subscriber, while
> > making connection to primary/publisher, can choose to get the
> > replication slot info.
>
> I think it is possible that the standby is restoring the WAL directly
> from the archive location and there might not be any wal sender at
> time. So I think the idea of standby pulling the WAL looks better to
> me.

My point was that why can't we let the walreceiver (of course users
can configure it on the standby/subscriber) to choose whether or not
to receive the replication (both physical and logical) slot info from
the primary/publisher and if yes, the walsender(on the
primary/publisher) sending it probably as a new WAL record or just
piggybacking the replication slot info with any of the existing WAL
records.

Or simply a common bg worker (as opposed to the bg worker proposed
originally in this thread which, IIUC, works for logical replication)
running on standby/subscriber for getting both the physical and
logical replication slots info.

> > As I said upthread, the problem I see with standby/subscriber pulling
> > the info is that: how frequently the standby/subscriber is going to
> > sync the slot info from primary/publisher? How can it ensure that the
> > latest information exists say when the subscriber is down/crashed
> > before it picks up the latest slot information?
>
> Yeah that is a good question that how frequently the subscriber should
> fetch the slot information, I think that should be configurable
> values.  And the time delay is more, the chances of losing the latest
> slot is more.

I agree that it should be configurable. Even if the primary/publisher
is down/crashed, one can still compare the latest slot info from both
the primary/publisher and standby/subscriber using a new tool
pg_replslotdata proposed at [1] and see how far and which slots missed
the latest replication slot info and probably drop those alone to
recreate and retain other slots as is.

[1] - https://www.postgresql.org/message-id/CALj2ACW0rV5gWK8A3m6_X62qH%2BVfaq5hznC%3Di0R5Wojt5%2Byhyw%40mail.gmail.com

Regards,
Bharath Rupireddy.

Re: Synchronizing slots from primary to standby

From

Dilip Kumar

Date:

29 November 2021, 07:39:48

On Mon, Nov 29, 2021 at 12:19 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Nov 29, 2021 at 11:14 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Nov 29, 2021 at 9:40 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Mon, Nov 29, 2021 at 1:48 AM SATYANARAYANA NARLAPURAM
> > > <satyanarlapuram@gmail.com> wrote:
> > > >
> > > >> 3) Instead of the subscriber pulling the slot info, why can't the
> > > >> publisher (via the walsender or a new bg worker maybe?) push the
> > > >> latest slot info? I'm not sure we want to add more functionality to
> > > >> the walsender, if yes, isn't it going to be much simpler?
> > > >
> > > > Standby pulling the information or at least making a first attempt to connect to the  primary is a better
designas primary doesn't need to spend its cycles repeatedly connecting to an unreachable standby. In fact, primary
wouldn'teven need to know the followers, for example followers / log shipping standbys 
> > >
> > > My idea was to let the existing walsender from the primary/publisher
> > > to send the slot info (both logical and physical replication slots) to
> > > the standby/subscriber, probably by piggybacking the slot info with
> > > the WAL currently it sends. Having said that, I don't know the
> > > feasibility of it. Anyways, I'm not in favour of having a new bg
> > > worker to just ship the slot info. The standby/subscriber, while
> > > making connection to primary/publisher, can choose to get the
> > > replication slot info.
> >
> > I think it is possible that the standby is restoring the WAL directly
> > from the archive location and there might not be any wal sender at
> > time. So I think the idea of standby pulling the WAL looks better to
> > me.
>
> My point was that why can't we let the walreceiver (of course users
> can configure it on the standby/subscriber) to choose whether or not
> to receive the replication (both physical and logical) slot info from
> the primary/publisher and if yes, the walsender(on the
> primary/publisher) sending it probably as a new WAL record or just
> piggybacking the replication slot info with any of the existing WAL
> records.

Okay, I thought your point was that the primary pushing is better over
standby pulling the slot info, but now it seems that you also agree
that standby pulling is better right?  Now it appears your point is
about whether we will use the same connection for pulling the slot
information which we are using for streaming the data or any other
connection?  I mean in this patch also we are creating a replication
connection and pulling the slot information over there, just point is
we are starting a separate worker for pulling the slot information,
and I think that approach is better as this will not impact the
performance of the other replication connection which we are using for
communicating the data.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Synchronizing slots from primary to standby

From

Bharath Rupireddy

Date:

29 November 2021, 12:28:26

On Mon, Nov 29, 2021 at 1:10 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Nov 29, 2021 at 12:19 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Mon, Nov 29, 2021 at 11:14 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Mon, Nov 29, 2021 at 9:40 AM Bharath Rupireddy
> > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > >
> > > > On Mon, Nov 29, 2021 at 1:48 AM SATYANARAYANA NARLAPURAM
> > > > <satyanarlapuram@gmail.com> wrote:
> > > > >
> > > > >> 3) Instead of the subscriber pulling the slot info, why can't the
> > > > >> publisher (via the walsender or a new bg worker maybe?) push the
> > > > >> latest slot info? I'm not sure we want to add more functionality to
> > > > >> the walsender, if yes, isn't it going to be much simpler?
> > > > >
> > > > > Standby pulling the information or at least making a first attempt to connect to the  primary is a better
designas primary doesn't need to spend its cycles repeatedly connecting to an unreachable standby. In fact, primary
wouldn'teven need to know the followers, for example followers / log shipping standbys 
> > > >
> > > > My idea was to let the existing walsender from the primary/publisher
> > > > to send the slot info (both logical and physical replication slots) to
> > > > the standby/subscriber, probably by piggybacking the slot info with
> > > > the WAL currently it sends. Having said that, I don't know the
> > > > feasibility of it. Anyways, I'm not in favour of having a new bg
> > > > worker to just ship the slot info. The standby/subscriber, while
> > > > making connection to primary/publisher, can choose to get the
> > > > replication slot info.
> > >
> > > I think it is possible that the standby is restoring the WAL directly
> > > from the archive location and there might not be any wal sender at
> > > time. So I think the idea of standby pulling the WAL looks better to
> > > me.
> >
> > My point was that why can't we let the walreceiver (of course users
> > can configure it on the standby/subscriber) to choose whether or not
> > to receive the replication (both physical and logical) slot info from
> > the primary/publisher and if yes, the walsender(on the
> > primary/publisher) sending it probably as a new WAL record or just
> > piggybacking the replication slot info with any of the existing WAL
> > records.
>
> Okay, I thought your point was that the primary pushing is better over
> standby pulling the slot info, but now it seems that you also agree
> that standby pulling is better right?  Now it appears your point is
> about whether we will use the same connection for pulling the slot
> information which we are using for streaming the data or any other
> connection?  I mean in this patch also we are creating a replication
> connection and pulling the slot information over there, just point is
> we are starting a separate worker for pulling the slot information,
> and I think that approach is better as this will not impact the
> performance of the other replication connection which we are using for
> communicating the data.

The easiest way to implement this feature so far, is to use a common
bg worker (as opposed to the bg worker proposed originally in this
thread which, IIUC, works for logical replication) running on standby
(in case of streaming replication with physical replication slots) or
subscriber (in case of logical replication with logical replication
slots) for getting both the physical and logical replication slots
info from the primary or publisher. This bg worker requires at least
two GUCs, 1) to enable/disable the worker 2) to define the slot sync
interval (the bg worker gets the slots info after every sync interval
of time).

Thoughts?

Regards,
Bharath Rupireddy.

Re: Synchronizing slots from primary to standby

From

Peter Eisentraut

Date:

14 December 2021, 22:12:44

On 31.10.21 11:08, Peter Eisentraut wrote:
> I want to reactivate $subject.  I took Petr Jelinek's patch from [0], 
> rebased it, added a bit of testing.  It basically works, but as 
> mentioned in [0], there are various issues to work out.
> 
> The idea is that the standby runs a background worker to periodically 
> fetch replication slot information from the primary.  On failover, a 
> logical subscriber would then ideally find up-to-date replication slots 
> on the new publisher and can just continue normally.

> So, again, this isn't anywhere near ready, but there is already a lot 
> here to gather feedback about how it works, how it should work, how to 
> configure it, and how it fits into an overall replication and HA 
> architecture.

Here is an updated patch.  The main changes are that I added two 
configuration parameters.  The first, synchronize_slot_names, is set on 
the physical standby to specify which slots to sync from the primary. 
By default, it is empty.  (This also fixes the recovery test failures 
that I had to disable in the previous patch version.)  The second, 
standby_slot_names, is set on the primary.  It holds back logical 
replication until the listed physical standbys have caught up.  That 
way, when failover is necessary, the promoted standby is not behind the 
logical replication consumers.

In principle, this works now, I think.  I haven't made much progress in 
creating more test cases for this; that's something that needs more 
attention.

It's worth pondering what the configuration language for 
standby_slot_names should be.  Right now, it's just a list of slots that 
all need to be caught up.  More complicated setups are conceivable. 
Maybe you have standbys S1 and S2 that are potential failover targets 
for logical replication consumers L1 and L2, and also standbys S3 and S4 
that are potential failover targets for logical replication consumers L3 
and L4.  Viewed like that, this setting could be a replication slot 
setting.  The setting might also have some relationship with 
synchronous_standby_names.  Like, if you have synchronous_standby_names 
set, then that's a pretty good indication that you also want some or all 
of those standbys in standby_slot_names.  (But note that one is slots 
and one is application names.)  So there are a variety of possibilities.

Attachment

v2-0001-Synchronize-logical-replication-slots-from-primar.patch

Re: Synchronizing slots from primary to standby

From

Peter Eisentraut

Date:

14 December 2021, 22:13:43

On 24.11.21 07:11, Masahiko Sawada wrote:
> I haven’t looked at the patch deeply but regarding 007_sync_rep.pl,
> the tests seem to fail since the tests rely on the order of the wal
> sender array on the shared memory. Since a background worker for
> synchronizing replication slots periodically connects to the walsender
> on the primary and disconnects, it breaks the assumption of the order.
> Regarding 010_logical_decoding_timelines.pl, I guess that the patch
> breaks the test because the background worker for synchronizing slots
> on the replica periodically advances the replica's slot. I think we
> need to have a way to disable the slot synchronization or to specify
> the slot name to sync with the primary. I'm not sure we already
> discussed this topic but I think we need it at least for testing
> purposes.

This has been addressed by patch v2 that adds such a setting.

Re: Synchronizing slots from primary to standby

From

Peter Eisentraut

Date:

14 December 2021, 22:19:05

On 24.11.21 17:25, Dimitri Fontaine wrote:
> Is there a case to be made about doing the same thing for physical
> replication slots too?

It has been considered.  At the moment, I'm not doing it, because it 
would add more code and complexity and it's not that important.  But it 
could be added in the future.

> Given the admitted state of the patch, I didn't focus on tests. I could
> successfully apply the patch on-top of current master's branch, and
> cleanly compile and `make check`.
> 
> Then I also updated pg_auto_failover to support Postgres 15devel [2] so
> that I could then `make NODES=3 cluster` there and play with the new
> replication command:
> 
>    $ psql -d "port=5501 replication=1" -c "LIST_SLOTS;"
>    psql:/Users/dim/.psqlrc:24: ERROR:  XX000: cannot execute SQL commands in WAL sender for physical replication
>    LOCATION:  exec_replication_command, walsender.c:1830
>    ...
> 
> I'm not too sure about this idea of running SQL in a replication
> protocol connection that you're mentioning, but I suppose that's just me
> needing to brush up on the topic.

FWIW, the way the replication command parser works, if there is a parse 
error, it tries to interpret the command as a plain SQL command.  But 
that only works for logical replication connections.  So in physical 
replication, if you try to run anything that does not parse, you will 
get this error.  But that has nothing to do with this feature.  The 
above command works for me, so maybe something else went wrong in your 
situation.

> Maybe the first question about configuration would be about selecting
> which slots a standby should maintain from the primary. Is it all of the
> slots that exists on both the nodes, or a sublist of that?
> 
> Is it possible to have a slot with the same name on a primary and a
> standby node, in a way that the standby's slot would be a completely
> separate entity from the primary's slot? If yes (I just don't know at
> the moment), well then, should we continue to allow that?

This has been added in v2.

> Also, do we want to even consider having the slot management on a
> primary node depend on the ability to sync the advancing on one or more
> standby nodes? I'm not sure to see that one as a good idea, but maybe we
> want to kill it publically very early then ;-)

I don't know what you mean by this.

Re: Synchronizing slots from primary to standby

From

Peter Eisentraut

Date:

14 December 2021, 22:24:02

On 28.11.21 07:52, Bharath Rupireddy wrote:
> 1) Instead of a new LIST_SLOT command, can't we use
> READ_REPLICATION_SLOT (slight modifications needs to be done to make
> it support logical replication slots and to get more information from
> the subscriber).

I looked at that but didn't see an obvious way to consolidate them. 
This is something we could look at again later.

> 2) How frequently the new bg worker is going to sync the slot info?
> How can it ensure that the latest information exists say when the
> subscriber is down/crashed before it picks up the latest slot
> information?

The interval is currently hardcoded, but could be a configuration 
setting.  In the v2 patch, there is a new setting that orders physical 
replication before logical so that the logical subscribers cannot get 
ahead of the physical standby.

> 3) Instead of the subscriber pulling the slot info, why can't the
> publisher (via the walsender or a new bg worker maybe?) push the
> latest slot info? I'm not sure we want to add more functionality to
> the walsender, if yes, isn't it going to be much simpler?

This sounds like the failover slot feature, which was rejected.

Re: Synchronizing slots from primary to standby

From

"Hsu, John"

Date:

16 December 2021, 02:15:42

Hello,

I started taking a brief look at the v2 patch, and it does appear to work for the basic case. Logical slot is
synchronizedacross and I can connect to the promoted standby and stream changes afterwards.
 

It's not clear to me what the correct behavior is when a logical slot that has been synced to the replica and then it
getsdeleted on the writer. Would we expect this to be propagated or leave it up to the end-user to manage?
 

> +       rawname = pstrdup(standby_slot_names);
> +       SplitIdentifierString(rawname, ',', &namelist);
> +
> +       while (true)
> +       {
> +               int                     wait_slots_remaining;
> +               XLogRecPtr      oldest_flush_pos = InvalidXLogRecPtr;
> +               int                     rc;
> +
> +               wait_slots_remaining = list_length(namelist);
> +
> +               LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> +               for (int i = 0; i < max_replication_slots; i++)
> +               {

Even though standby_slot_names is PGC_SIGHUP, we never reload/re-process the value. If we have a wrong entry in there,
thebackend becomes stuck until we re-establish the logical connection. Adding "postmaster/interrupt.h" with
ConfigReloadPending/ ProcessConfigFile does seem to work.
 

Another thing I noticed is that once it starts waiting in this block, Ctrl+C doesn't seem to terminate the backend?

pg_recvlogical -d postgres -p 5432 --slot regression_slot --start -f -
..
^Cpg_recvlogical: error: unexpected termination of replication stream: 

The logical backend connection is still present:

ps aux | grep 51263
   hsuchen 51263 80.7  0.0 320180 14304 ?        Rs   01:11   3:04 postgres: walsender hsuchen [local]
START_REPLICATION

pstack 51263
#0  0x00007ffee99e79a5 in clock_gettime ()
#1  0x00007f8705e88246 in clock_gettime () from /lib64/libc.so.6
#2  0x000000000075f141 in WaitEventSetWait ()
#3  0x000000000075f565 in WaitLatch ()
#4  0x0000000000720aea in ReorderBufferProcessTXN ()
#5  0x00000000007142a6 in DecodeXactOp ()
#6  0x000000000071460f in LogicalDecodingProcessRecord ()

It can be terminated with a pg_terminate_backend though.

If we have a physical slot with name foo on the standby, and then a logical slot is created on the writer with the same
slot_nameit does error out on the replica although it prevents other slots from being synchronized which is probably
fine.

2021-12-16 02:10:29.709 UTC [73788] LOG:  replication slot synchronization worker for database "postgres" has started
2021-12-16 02:10:29.713 UTC [73788] ERROR:  cannot use physical replication slot for logical decoding
2021-12-16 02:10:29.714 UTC [73037] DEBUG:  unregistering background worker "replication slot synchronization worker"

On 12/14/21, 2:26 PM, "Peter Eisentraut" <peter.eisentraut@enterprisedb.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you
canconfirm the sender and know the content is safe.
 



    On 28.11.21 07:52, Bharath Rupireddy wrote:
    > 1) Instead of a new LIST_SLOT command, can't we use
    > READ_REPLICATION_SLOT (slight modifications needs to be done to make
    > it support logical replication slots and to get more information from
    > the subscriber).

    I looked at that but didn't see an obvious way to consolidate them.
    This is something we could look at again later.

    > 2) How frequently the new bg worker is going to sync the slot info?
    > How can it ensure that the latest information exists say when the
    > subscriber is down/crashed before it picks up the latest slot
    > information?

    The interval is currently hardcoded, but could be a configuration
    setting.  In the v2 patch, there is a new setting that orders physical
    replication before logical so that the logical subscribers cannot get
    ahead of the physical standby.

    > 3) Instead of the subscriber pulling the slot info, why can't the
    > publisher (via the walsender or a new bg worker maybe?) push the
    > latest slot info? I'm not sure we want to add more functionality to
    > the walsender, if yes, isn't it going to be much simpler?

    This sounds like the failover slot feature, which was rejected.

Re: Synchronizing slots from primary to standby

From

Peter Eisentraut

Date:

03 January 2022, 13:46:52

Here is an updated patch to fix some build failures.  No feature changes.

On 14.12.21 23:12, Peter Eisentraut wrote:
> On 31.10.21 11:08, Peter Eisentraut wrote:
>> I want to reactivate $subject.  I took Petr Jelinek's patch from [0], 
>> rebased it, added a bit of testing.  It basically works, but as 
>> mentioned in [0], there are various issues to work out.
>>
>> The idea is that the standby runs a background worker to periodically 
>> fetch replication slot information from the primary.  On failover, a 
>> logical subscriber would then ideally find up-to-date replication 
>> slots on the new publisher and can just continue normally.
> 
>> So, again, this isn't anywhere near ready, but there is already a lot 
>> here to gather feedback about how it works, how it should work, how to 
>> configure it, and how it fits into an overall replication and HA 
>> architecture.
> 
> Here is an updated patch.  The main changes are that I added two 
> configuration parameters.  The first, synchronize_slot_names, is set on 
> the physical standby to specify which slots to sync from the primary. By 
> default, it is empty.  (This also fixes the recovery test failures that 
> I had to disable in the previous patch version.)  The second, 
> standby_slot_names, is set on the primary.  It holds back logical 
> replication until the listed physical standbys have caught up.  That 
> way, when failover is necessary, the promoted standby is not behind the 
> logical replication consumers.
> 
> In principle, this works now, I think.  I haven't made much progress in 
> creating more test cases for this; that's something that needs more 
> attention.
> 
> It's worth pondering what the configuration language for 
> standby_slot_names should be.  Right now, it's just a list of slots that 
> all need to be caught up.  More complicated setups are conceivable. 
> Maybe you have standbys S1 and S2 that are potential failover targets 
> for logical replication consumers L1 and L2, and also standbys S3 and S4 
> that are potential failover targets for logical replication consumers L3 
> and L4.  Viewed like that, this setting could be a replication slot 
> setting.  The setting might also have some relationship with 
> synchronous_standby_names.  Like, if you have synchronous_standby_names 
> set, then that's a pretty good indication that you also want some or all 
> of those standbys in standby_slot_names.  (But note that one is slots 
> and one is application names.)  So there are a variety of possibilities.

Attachment

v3-0001-Synchronize-logical-replication-slots-from-primar.patch

Re: Synchronizing slots from primary to standby

From

Masahiko Sawada

Date:

21 January 2022, 05:33:07

On Wed, Dec 15, 2021 at 7:13 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 31.10.21 11:08, Peter Eisentraut wrote:
> > I want to reactivate $subject.  I took Petr Jelinek's patch from [0],
> > rebased it, added a bit of testing.  It basically works, but as
> > mentioned in [0], there are various issues to work out.
> >
> > The idea is that the standby runs a background worker to periodically
> > fetch replication slot information from the primary.  On failover, a
> > logical subscriber would then ideally find up-to-date replication slots
> > on the new publisher and can just continue normally.
>
> > So, again, this isn't anywhere near ready, but there is already a lot
> > here to gather feedback about how it works, how it should work, how to
> > configure it, and how it fits into an overall replication and HA
> > architecture.
>
> The second,
> standby_slot_names, is set on the primary.  It holds back logical
> replication until the listed physical standbys have caught up.  That
> way, when failover is necessary, the promoted standby is not behind the
> logical replication consumers.

I might be missing something but isn’t it okay even if the new primary
server is behind the subscribers? IOW, even if two slot's LSNs (i.e.,
restart_lsn and confirm_flush_lsn) are behind the subscriber's remote
LSN (i.e., pg_replication_origin.remote_lsn), the primary sends only
transactions that were committed after the remote_lsn. So the
subscriber can resume logical replication with the new primary without
any data loss.

The new primary should not be ahead of the subscribers because it
forwards the logical replication start LSN to the slot’s
confirm_flush_lsn in this case. But it cannot happen since the remote
LSN of the subscriber’s origin is always updated first, then the
confirm_flush_lsn of the slot on the primary is updated, and then the
confirm_flush_lsn of the slot on the standby is synchronized.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Synchronizing slots from primary to standby

From

"Hsu, John"

Date:

21 January 2022, 23:02:50

> I might be missing something but isn’t it okay even if the new primary
   > server is behind the subscribers? IOW, even if two slot's LSNs (i.e.,
   > restart_lsn and confirm_flush_lsn) are behind the subscriber's remote
   > LSN (i.e., pg_replication_origin.remote_lsn), the primary sends only
   > transactions that were committed after the remote_lsn. So the
   > subscriber can resume logical replication with the new primary without
   > any data loss.

    Maybe I'm misreading, but I thought the purpose of this to make 
    sure that the logical subscriber does not have data that has not been
    replicated to the new primary. The use-case I can think of would be
    if synchronous_commit were enabled and fail-over occurs. If
    we didn't have this set, isn't it possible that this logical subscriber
    has extra commits that aren't present on the newly promoted primary?

    And sorry I accidentally started a new thread in my last reply. 
    Re-pasting some of my previous questions/comments:

    wait_for_standby_confirmation does not update standby_slot_names once it's
    in a loop and can't be fixed with SIGHUP. Similarly, synchronize_slot_names 
    isn't updated once the worker is launched.

    If a logical slot was dropped on the writer, should the worker drop logical 
    slots that it was previously synchronizing but are no longer present? Or 
    should we leave that to the user to manage? I'm trying to think why users 
    would want to sync logical slots to a reader but not have that be dropped 
    as well if it's no longer present.

    Is there a reason we're deciding to use one-worker syncing per database 
    instead of one general worker that syncs across all the databases? 
    I imagine I'm missing something obvious here. 

    As for how standby_slot_names should be configured, I'd prefer the 
    flexibility similar to what we have for synchronus_standby_names since 
    that seems the most analogous. It'd provide flexibility for failovers, 
    which I imagine is the most common use-case.

On 1/20/22, 9:34 PM, "Masahiko Sawada" <sawada.mshk@gmail.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you
canconfirm the sender and know the content is safe.
 



    On Wed, Dec 15, 2021 at 7:13 AM Peter Eisentraut
    <peter.eisentraut@enterprisedb.com> wrote:
    >
    > On 31.10.21 11:08, Peter Eisentraut wrote:
    > > I want to reactivate $subject.  I took Petr Jelinek's patch from [0],
    > > rebased it, added a bit of testing.  It basically works, but as
    > > mentioned in [0], there are various issues to work out.
    > >
    > > The idea is that the standby runs a background worker to periodically
    > > fetch replication slot information from the primary.  On failover, a
    > > logical subscriber would then ideally find up-to-date replication slots
    > > on the new publisher and can just continue normally.
    >
    > > So, again, this isn't anywhere near ready, but there is already a lot
    > > here to gather feedback about how it works, how it should work, how to
    > > configure it, and how it fits into an overall replication and HA
    > > architecture.
    >
    > The second,
    > standby_slot_names, is set on the primary.  It holds back logical
    > replication until the listed physical standbys have caught up.  That
    > way, when failover is necessary, the promoted standby is not behind the
    > logical replication consumers.

    I might be missing something but isn’t it okay even if the new primary
    server is behind the subscribers? IOW, even if two slot's LSNs (i.e.,
    restart_lsn and confirm_flush_lsn) are behind the subscriber's remote
    LSN (i.e., pg_replication_origin.remote_lsn), the primary sends only
    transactions that were committed after the remote_lsn. So the
    subscriber can resume logical replication with the new primary without
    any data loss.

    The new primary should not be ahead of the subscribers because it
    forwards the logical replication start LSN to the slot’s
    confirm_flush_lsn in this case. But it cannot happen since the remote
    LSN of the subscriber’s origin is always updated first, then the
    confirm_flush_lsn of the slot on the primary is updated, and then the
    confirm_flush_lsn of the slot on the standby is synchronized.

    Regards,

    --
    Masahiko Sawada
    EDB:  https://www.enterprisedb.com/

Re: Synchronizing slots from primary to standby

From

Ashutosh Sharma

Date:

04 February 2022, 17:51:24

On Sat, Jan 22, 2022 at 4:33 AM Hsu, John <hsuchen@amazon.com> wrote:
>
>    > I might be missing something but isn’t it okay even if the new primary
>    > server is behind the subscribers? IOW, even if two slot's LSNs (i.e.,
>    > restart_lsn and confirm_flush_lsn) are behind the subscriber's remote
>    > LSN (i.e., pg_replication_origin.remote_lsn), the primary sends only
>    > transactions that were committed after the remote_lsn. So the
>    > subscriber can resume logical replication with the new primary without
>    > any data loss.
>
>     Maybe I'm misreading, but I thought the purpose of this to make
>     sure that the logical subscriber does not have data that has not been
>     replicated to the new primary. The use-case I can think of would be
>     if synchronous_commit were enabled and fail-over occurs. If
>     we didn't have this set, isn't it possible that this logical subscriber
>     has extra commits that aren't present on the newly promoted primary?
>

This is very much possible if the new primary used to be asynchronous
standby. But, it seems like the current patch is trying to hold the
logical replication until the data has been replicated to the physical
standby when synchronous_slot_names is set. This will ensure that the
logical subscriber is never ahead of the new primary. However, AFAIU
that's not the primary use-case of this patch; instead this is to
ensure that the logical subscribers continue getting data from the new
primary when the failover occurs.

>
>     If a logical slot was dropped on the writer, should the worker drop logical
>     slots that it was previously synchronizing but are no longer present? Or
>     should we leave that to the user to manage? I'm trying to think why users
>     would want to sync logical slots to a reader but not have that be dropped
>     as well if it's no longer present.
>

AFAIU this should be taken care of by the background worker used to
synchronize the replication slot.

--
With Regards,
Ashutosh Sharma.

Re: Synchronizing slots from primary to standby

From

Andres Freund

Date:

05 February 2022, 19:59:44

Hi,

On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote:
> From ec00dc6ab8bafefc00e9b1c78ac9348b643b8a87 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut <peter@eisentraut.org>
> Date: Mon, 3 Jan 2022 14:43:36 +0100
> Subject: [PATCH v3] Synchronize logical replication slots from primary to
>  standby

I've just skimmed the patch and the related threads. As far as I can tell this
cannot be safely used without the conflict handling in [1], is that correct?


Greetings,

Andres Freund

[1] https://postgr.es/m/CA%2BTgmoZd-JqNL1-R3RJ0jQRD%2B-dc94X0nPJgh%2BdwdDF0rFuE3g%40mail.gmail.com

Re: Synchronizing slots from primary to standby

From

Ashutosh Sharma

Date:

07 February 2022, 08:08:38

Hi Andres,

Are you talking about this scenario - what if the logical replication
slot on the publisher is dropped, but is being referenced by the
standby where the slot is synchronized? Should the redo function for
the drop replication slot have the capability to drop it on standby
and its subscribers (if any) as well?

--
With Regards,
Ashutosh Sharma.

On Sun, Feb 6, 2022 at 1:29 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote:
> > From ec00dc6ab8bafefc00e9b1c78ac9348b643b8a87 Mon Sep 17 00:00:00 2001
> > From: Peter Eisentraut <peter@eisentraut.org>
> > Date: Mon, 3 Jan 2022 14:43:36 +0100
> > Subject: [PATCH v3] Synchronize logical replication slots from primary to
> >  standby
>
> I've just skimmed the patch and the related threads. As far as I can tell this
> cannot be safely used without the conflict handling in [1], is that correct?
>
>
> Greetings,
>
> Andres Freund
>
> [1] https://postgr.es/m/CA%2BTgmoZd-JqNL1-R3RJ0jQRD%2B-dc94X0nPJgh%2BdwdDF0rFuE3g%40mail.gmail.com
>
>

Re: Synchronizing slots from primary to standby

From

Andres Freund

Date:

07 February 2022, 20:32:22

Hi,

On 2022-02-07 13:38:38 +0530, Ashutosh Sharma wrote:
> Are you talking about this scenario - what if the logical replication
> slot on the publisher is dropped, but is being referenced by the
> standby where the slot is synchronized?

It's a bit hard to say, because neither in this thread nor in the patch I've
found a clear description of what the syncing needs to & tries to
guarantee. It might be that that was discussed in one of the precursor
threads, but...

Generally I don't think we can permit scenarios where a slot can be in a
"corrupt" state, i.e. missing required catalog entries, after "normal"
administrative commands (i.e. not mucking around in catalog entries / on-disk
files). Even if the sequence of commands may be a bit weird. All such cases
need to be either prevented or detected.

As far as I can tell, the way this patch keeps slots on physical replicas
"valid" is solely by reorderbuffer.c blocking during replay via
wait_for_standby_confirmation().

Which means that if e.g. the standby_slot_names GUC differs from
synchronize_slot_names on the physical replica, the slots synchronized on the
physical replica are not going to be valid.  Or if the primary drops its
logical slots.

> Should the redo function for the drop replication slot have the capability
> to drop it on standby and its subscribers (if any) as well?

Slots are not WAL logged (and shouldn't be).

I think you pretty much need the recovery conflict handling infrastructure I
referenced upthread, which recognized during replay if a record has a conflict
with a slot on a standby.  And then ontop of that you can build something like
this patch.

Greetings,

Andres Freund

Re: Synchronizing slots from primary to standby

From

Andres Freund

Date:

07 February 2022, 20:45:57

Hi,

On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote:
> +static void
> +ApplyLauncherStartSlotSync(TimestampTz *last_start_time, long *wait_time)
> +{
> [...]
> +
> +    foreach(lc, slots)
> +    {
> +        WalRecvReplicationSlotData *slot_data = lfirst(lc);
> +        LogicalRepWorker *w;
> +
> +        if (!OidIsValid(slot_data->database))
> +            continue;
> +
> +        LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> +        w = logicalrep_worker_find(slot_data->database, InvalidOid,
> +                                   InvalidOid, false);
> +        LWLockRelease(LogicalRepWorkerLock);
> +
> +        if (w == NULL)
> +        {
> +            *last_start_time = now;
> +            *wait_time = wal_retrieve_retry_interval;
> +
> +            logicalrep_worker_launch(slot_data->database, InvalidOid, NULL,
> +                                     BOOTSTRAP_SUPERUSERID, InvalidOid);

Do we really need a dedicated worker for each single slot? That seems
excessively expensive.


> +++ b/src/backend/replication/logical/reorderbuffer.c
> [...]
> +static void
> +wait_for_standby_confirmation(XLogRecPtr commit_lsn)
> +{
> +    char       *rawname;
> +    List       *namelist;
> +    ListCell   *lc;
> +    XLogRecPtr    flush_pos = InvalidXLogRecPtr;
> +
> +    if (strcmp(standby_slot_names, "") == 0)
> +        return;
> +
> +    rawname = pstrdup(standby_slot_names);
> +    SplitIdentifierString(rawname, ',', &namelist);
> +
> +    while (true)
> +    {
> +        int            wait_slots_remaining;
> +        XLogRecPtr    oldest_flush_pos = InvalidXLogRecPtr;
> +        int            rc;
> +
> +        wait_slots_remaining = list_length(namelist);
> +
> +        LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> +        for (int i = 0; i < max_replication_slots; i++)
> +        {
> +            ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
> +            bool        inlist;
> +
> +            if (!s->in_use)
> +                continue;
> +
> +            inlist = false;
> +            foreach (lc, namelist)
> +            {
> +                char *name = lfirst(lc);
> +                if (strcmp(name, NameStr(s->data.name)) == 0)
> +                {
> +                    inlist = true;
> +                    break;
> +                }
> +            }
> +            if (!inlist)
> +                continue;
> +
> +            SpinLockAcquire(&s->mutex);

It doesn't seem like a good idea to perform O(max_replication_slots *
#standby_slot_names) work on each decoded commit. Nor to
SplitIdentifierString(pstrdup(standby_slot_names)) every time.


> +            if (s->data.database == InvalidOid)
> +                /* Physical slots advance restart_lsn on flush and ignore confirmed_flush_lsn */
> +                flush_pos = s->data.restart_lsn;
> +            else
> +                /* For logical slots we must wait for commit and flush */
> +                flush_pos = s->data.confirmed_flush;
> +
> +            SpinLockRelease(&s->mutex);
> +
> +            /* We want to find out the min(flush pos) over all named slots */
> +            if (oldest_flush_pos == InvalidXLogRecPtr
> +                || oldest_flush_pos > flush_pos)
> +                oldest_flush_pos = flush_pos;
> +
> +            if (flush_pos >= commit_lsn && wait_slots_remaining > 0)
> +                wait_slots_remaining --;
> +        }
> +        LWLockRelease(ReplicationSlotControlLock);
> +
> +        if (wait_slots_remaining == 0)
> +            return;
> +
> +        rc = WaitLatch(MyLatch,
> +                       WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
> +                       1000L, PG_WAIT_EXTENSION);

I don't think it's a good idea to block here like this - no walsender specific
handling is going to happen. E.g. not sending replies to receivers will cause
them to time out.   And for the SQL functions this will cause blocking even
though the interface expects to return when reaching the end of the WAL -
which this pretty much is.


I think this needs to be restructured so that you only do the checking of the
"up to this point" position when needed, rather than every commit. We already
*have* a check for not replaying further than the flushed WAL position, see
the GetFlushRecPtr() calls in WalSndWaitForWal(),
pg_logical_slot_get_changes_guts(). I think you'd basically need to integrate
with that, rather than introduce blocking in reorderbuffer.c



> +        if (rc & WL_POSTMASTER_DEATH)
> +            proc_exit(1);

Should use WL_EXIT_ON_PM_DEATH these days.

Greetings,

Andres Freund

Re: Synchronizing slots from primary to standby

From

Ashutosh Sharma

Date:

08 February 2022, 14:57:32

On Tue, Feb 8, 2022 at 2:02 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-02-07 13:38:38 +0530, Ashutosh Sharma wrote:
> > Are you talking about this scenario - what if the logical replication
> > slot on the publisher is dropped, but is being referenced by the
> > standby where the slot is synchronized?
>
> It's a bit hard to say, because neither in this thread nor in the patch I've
> found a clear description of what the syncing needs to & tries to
> guarantee. It might be that that was discussed in one of the precursor
> threads, but...
>
> Generally I don't think we can permit scenarios where a slot can be in a
> "corrupt" state, i.e. missing required catalog entries, after "normal"
> administrative commands (i.e. not mucking around in catalog entries / on-disk
> files). Even if the sequence of commands may be a bit weird. All such cases
> need to be either prevented or detected.
>
>
> As far as I can tell, the way this patch keeps slots on physical replicas
> "valid" is solely by reorderbuffer.c blocking during replay via
> wait_for_standby_confirmation().
>
> Which means that if e.g. the standby_slot_names GUC differs from
> synchronize_slot_names on the physical replica, the slots synchronized on the
> physical replica are not going to be valid.  Or if the primary drops its
> logical slots.
>
>
> > Should the redo function for the drop replication slot have the capability
> > to drop it on standby and its subscribers (if any) as well?
>
> Slots are not WAL logged (and shouldn't be).
>
> I think you pretty much need the recovery conflict handling infrastructure I
> referenced upthread, which recognized during replay if a record has a conflict
> with a slot on a standby.  And then ontop of that you can build something like
> this patch.
>

OK. Understood, thanks Andres.

--
With Regards,
Ashutosh Sharma.

Re: Synchronizing slots from primary to standby

From

Bruce Momjian

Date:

10 February 2022, 21:47:15

On Tue, Feb  8, 2022 at 08:27:32PM +0530, Ashutosh Sharma wrote:
> > Which means that if e.g. the standby_slot_names GUC differs from
> > synchronize_slot_names on the physical replica, the slots synchronized on the
> > physical replica are not going to be valid.  Or if the primary drops its
> > logical slots.
> >
> >
> > > Should the redo function for the drop replication slot have the capability
> > > to drop it on standby and its subscribers (if any) as well?
> >
> > Slots are not WAL logged (and shouldn't be).
> >
> > I think you pretty much need the recovery conflict handling infrastructure I
> > referenced upthread, which recognized during replay if a record has a conflict
> > with a slot on a standby.  And then ontop of that you can build something like
> > this patch.
> >
> 
> OK. Understood, thanks Andres.

I would love to see this feature in PG 15.  Can someone explain its
current status?  Thanks.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Synchronizing slots from primary to standby

From

Peter Eisentraut

Date:

11 February 2022, 14:26:03

On 10.02.22 22:47, Bruce Momjian wrote:
> On Tue, Feb  8, 2022 at 08:27:32PM +0530, Ashutosh Sharma wrote:
>>> Which means that if e.g. the standby_slot_names GUC differs from
>>> synchronize_slot_names on the physical replica, the slots synchronized on the
>>> physical replica are not going to be valid.  Or if the primary drops its
>>> logical slots.
>>>
>>>
>>>> Should the redo function for the drop replication slot have the capability
>>>> to drop it on standby and its subscribers (if any) as well?
>>>
>>> Slots are not WAL logged (and shouldn't be).
>>>
>>> I think you pretty much need the recovery conflict handling infrastructure I
>>> referenced upthread, which recognized during replay if a record has a conflict
>>> with a slot on a standby.  And then ontop of that you can build something like
>>> this patch.
>>>
>>
>> OK. Understood, thanks Andres.
> 
> I would love to see this feature in PG 15.  Can someone explain its
> current status?  Thanks.

The way I understand it:

1. This feature (probably) depends on the "Minimal logical decoding on 
standbys" patch.  The details there aren't totally clear (to me).  That 
patch had some activity lately but I don't see it in a state that it's 
nearing readiness.

2. I think the way this (my) patch is currently written needs some 
refactoring about how we launch and manage workers.  Right now, it's all 
mangled together with logical replication, since that is a convenient 
way to launch and manage workers, but it really doesn't need to be tied 
to logical replication, since it can also be used for other logical slots.

3. It's an open question how to configure this.  My patch show a very 
minimal configuration that allows you to keep all logical slots always 
behind one physical slot, which addresses one particular use case.  In 
general, you might have things like, one set of logical slots should 
stay behind one physical slot, another set behind another physical slot, 
another set should not care, etc.  This could turn into something like 
the synchronous replication feature, where it ends up with its own 
configuration language.

Each of these are clearly significant jobs on their own.

Re: Synchronizing slots from primary to standby

From

Peter Eisentraut

Date:

11 February 2022, 14:28:19

On 05.02.22 20:59, Andres Freund wrote:
> On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote:
>>  From ec00dc6ab8bafefc00e9b1c78ac9348b643b8a87 Mon Sep 17 00:00:00 2001
>> From: Peter Eisentraut<peter@eisentraut.org>
>> Date: Mon, 3 Jan 2022 14:43:36 +0100
>> Subject: [PATCH v3] Synchronize logical replication slots from primary to
>>   standby
> I've just skimmed the patch and the related threads. As far as I can tell this
> cannot be safely used without the conflict handling in [1], is that correct?

This or similar questions have been asked a few times about this or 
similar patches, but they always come with some doubt.  If we think so, 
it would be useful perhaps if we could come up with test cases that 
would demonstrate why that other patch/feature is necessary.  (I'm not 
questioning it personally, I'm just throwing out ideas here.)

Re: Synchronizing slots from primary to standby

From

James Coleman

Date:

18 February 2022, 15:25:44

On Fri, Feb 11, 2022 at 9:26 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 10.02.22 22:47, Bruce Momjian wrote:
> > I would love to see this feature in PG 15.  Can someone explain its
> > current status?  Thanks.
>
> The way I understand it:
> ...

Hi Peter,

I'm starting to review this patch, and last time I checked I noticed
it didn't seem to apply cleanly to master anymore. Would you be able
to send a rebased version?

Thanks,
James Coleman

Re: Synchronizing slots from primary to standby

From

James Coleman

Date:

18 February 2022, 21:46:56

On Fri, Feb 11, 2022 at 9:26 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> The way I understand it:
>
> 1. This feature (probably) depends on the "Minimal logical decoding on
> standbys" patch.  The details there aren't totally clear (to me).  That
> patch had some activity lately but I don't see it in a state that it's
> nearing readiness.
>
> 2. I think the way this (my) patch is currently written needs some
> refactoring about how we launch and manage workers.  Right now, it's all
> mangled together with logical replication, since that is a convenient
> way to launch and manage workers, but it really doesn't need to be tied
> to logical replication, since it can also be used for other logical slots.
>
> 3. It's an open question how to configure this.  My patch show a very
> minimal configuration that allows you to keep all logical slots always
> behind one physical slot, which addresses one particular use case.  In
> general, you might have things like, one set of logical slots should
> stay behind one physical slot, another set behind another physical slot,
> another set should not care, etc.  This could turn into something like
> the synchronous replication feature, where it ends up with its own
> configuration language.
>
> Each of these are clearly significant jobs on their own.
>

Thanks for bringing this topic back up again.

I haven't gotten a chance to do any testing on the patch yet, but here
are my initial notes from reviewing it:

First, reusing the logical replication launcher seems a bit gross.
It's obviously a pragmatic choice, but I find it confusing and likely
to become only moreso given the fact there's nothing about slot
syncing that's inherently limited to logical slots. Plus the feature
currently is about syncing slots on a physical replica. So I think
that probably should change.

Second, it seems to me that the worker-per-DB architecture means that
this is unworkable on a cluster with a large number of DBs. The
original thread said that was because "logical slots are per database,
walrcv_exec needs db connection, etc". As to the walrcv_exec, we're
(re)connecting to the primary for each synchronization anyway, so that
doesn't seem like a significant reason. I don't understand why logical
slots being per-database means we have to do it this way. Is there
something about the background worker architecture (I'm revealing my
own ignorance here I suppose) that requires this?

Also it seems that we reconnect to the primary every time we want to
synchronize slots. Maybe that's OK, but it struck me as a bit odd, so
I wanted to ask about it.

Third, wait_for_standby_confirmation() needs a function comment.
Andres noted this earlier, but it seems like we're doing quite a bit
of work in this function for each commit. Some of it is obviously
duplicative like the parsing of standby_slot_names. The waiting
introduced also doesn't seem like a good idea. Andres also commented
on that earlier; I'd echo his comments here absent an explanation of
why it's preferable/necessary to do it this way.

> + if (flush_pos >= commit_lsn && wait_slots_remaining > 0)
> +         wait_slots_remaining --;

I might be missing something re: project style, but the space before
the "--" looks odd to my eyes.

>    * Call either PREPARE (for two-phase transactions) or COMMIT (for
>    * regular ones).
>    */
> +
> + wait_for_standby_confirmation(commit_lsn);
> +
>   if (rbtxn_prepared(txn))
>           rb->prepare(rb, txn, commit_lsn);

>  else

It appears the addition of this call splits the comment from the code
it goes with.

> + * Wait for remote slot to pass localy reserved position.

Typo ("localy" -> "locally").

This patch would be a significant improvement for us; I'm hoping we
can see some activity on it. I'm also hoping to try to do some testing
next week and see if I can poke any holes in the functionality (with
the goal of verifying Andres's concerns about the safety without the
minimal logical decoding on a replica patch).

Thanks,
James Coleman

Re: Synchronizing slots from primary to standby

From

Andres Freund

Date:

18 February 2022, 22:23:19

Hi,

On 2022-02-11 15:28:19 +0100, Peter Eisentraut wrote:
> On 05.02.22 20:59, Andres Freund wrote:
> > On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote:
> > >  From ec00dc6ab8bafefc00e9b1c78ac9348b643b8a87 Mon Sep 17 00:00:00 2001
> > > From: Peter Eisentraut<peter@eisentraut.org>
> > > Date: Mon, 3 Jan 2022 14:43:36 +0100
> > > Subject: [PATCH v3] Synchronize logical replication slots from primary to
> > >   standby
> > I've just skimmed the patch and the related threads. As far as I can tell this
> > cannot be safely used without the conflict handling in [1], is that correct?
> 
> This or similar questions have been asked a few times about this or similar
> patches, but they always come with some doubt.

I'm certain it's a problem - the only reason I couched it was that there could
have been something clever in the patch preventing problems that I missed
because I just skimmed it.

> If we think so, it would be
> useful perhaps if we could come up with test cases that would demonstrate
> why that other patch/feature is necessary.  (I'm not questioning it
> personally, I'm just throwing out ideas here.)

The patch as-is just breaks one of the fundamental guarantees necessary for
logical decoding, that no rows versions can be removed that are still required
for logical decoding (signalled via catalog_xmin). So there needs to be an
explicit mechanism upholding that guarantee, but there is not right now from
what I can see.

One piece of the referenced patchset is that it adds information about removed
catalog rows to a few WAL records, and then verifies during replay that no
record can be replayed that removes resources that are still needed. If such a
conflict exists it's dealt with as a recovery conflict.

That itself doesn't provide prevention against removal of required, but it
provides detection. The prevention against removal can then be done using a
physical replication slot with hot standby feedback or some other mechanism
(e.g. slot syncing mechanism could maintain a "placeholder" slot on the
primary for all sync targets or something like that).

Even if that infrastructure existed / was merged, the slot sync stuff would
still need some very careful logic to protect against problems due to
concurrent WAL replay and "synchronized slot" creation. But that's doable.

Greetings,

Andres Freund

RE: Synchronizing slots from primary to standby

From

"kato-sho@fujitsu.com"

Date:

21 February 2022, 07:41:58

Hello,

This patch status is already returned with feedback.
However, I've applied this patch on 27b77ecf9f and tested so report it.

make installcheck-world is passed.
However, when I promote the standby server and update on the new primary server,
apply worker could not start logical replication and emmit the following error.

LOG:  background worker "logical replication worker" (PID 14506) exited with exit code 1
LOG:  logical replication apply worker for subscription "sub1" has started
ERROR:  terminating logical replication worker due to timeout
LOG:  background worker "logical replication worker" (PID 14535) exited with exit code 1
LOG:  logical replication apply worker for subscription "sub1" has started

It seems that apply worker does not start because wal sender is already exist on the new primary.
Do you have any thoughts about what the cause might be?

test script is attached.

regards, sho kato

Attachment

failover.sh

Re: Synchronizing slots from primary to standby

From

James Coleman

Date:

23 February 2022, 19:16:30

On Fri, Feb 18, 2022 at 5:23 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-02-11 15:28:19 +0100, Peter Eisentraut wrote:
> > On 05.02.22 20:59, Andres Freund wrote:
> > > On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote:
> > > >  From ec00dc6ab8bafefc00e9b1c78ac9348b643b8a87 Mon Sep 17 00:00:00 2001
> > > > From: Peter Eisentraut<peter@eisentraut.org>
> > > > Date: Mon, 3 Jan 2022 14:43:36 +0100
> > > > Subject: [PATCH v3] Synchronize logical replication slots from primary to
> > > >   standby
> > > I've just skimmed the patch and the related threads. As far as I can tell this
> > > cannot be safely used without the conflict handling in [1], is that correct?
> >
> > This or similar questions have been asked a few times about this or similar
> > patches, but they always come with some doubt.
>
> I'm certain it's a problem - the only reason I couched it was that there could
> have been something clever in the patch preventing problems that I missed
> because I just skimmed it.
>
>
> > If we think so, it would be
> > useful perhaps if we could come up with test cases that would demonstrate
> > why that other patch/feature is necessary.  (I'm not questioning it
> > personally, I'm just throwing out ideas here.)
>
> The patch as-is just breaks one of the fundamental guarantees necessary for
> logical decoding, that no rows versions can be removed that are still required
> for logical decoding (signalled via catalog_xmin). So there needs to be an
> explicit mechanism upholding that guarantee, but there is not right now from
> what I can see.

I've been working on adding test coverage to prove this out, but I've
encountered the problem reported in [1].

My assumption, but Andres please correct me if I'm wrong, that we
should see issues with the following steps (given the primary,
physical replica, and logical subscriber already created in the test):

1. Ensure both logical subscriber and physical replica are caught up
2. Disable logical subscription
3. Make a catalog change on the primary (currently renaming the
primary key column)
4. Vacuum pg_class
5. Ensure physical replication is caught up
6. Stop primary and promote the replica
7. Write to the changed table
8. Update subscription to point to promoted replica
9. Re-enable logical subscription

I'm attaching my test as an additional patch in the series for
reference. Currently I have steps 3 and 4 commented out to show that
the issues in [1] occur without any attempt to trigger the catalog
xmin problem.

Given this error seems pretty significant in terms of indicating
fundamental lack of test coverage (the primary stated benefit of the
patch is physical failover), and it currently is a blocker to testing
more deeply.

Thanks,
James Coleman

1:
https://www.postgresql.org/message-id/TYCPR01MB684949EA7AA904EE938548C79F3A9%40TYCPR01MB6849.jpnprd01.prod.outlook.com

Hi,

On 4/14/23 3:22 PM, Drouvot, Bertrand wrote:
> Now that the "Minimal logical decoding on standby" patch series (mentioned up-thread) has been
> committed, I think we can resume working on this one ("Synchronizing slots from primary to standby").
> 
> I'll work on a rebase and share it once done (unless someone already started working on a rebase).
> 

Please find attached V5 (a rebase of V4 posted up-thread).

In addition to the "rebasing" work, the TAP test adds a test about conflict handling (logical slot invalidation)
relying on the work done in the "Minimal logical decoding on standby" patch series.

I did not look more at the patch (than what's was needed for the rebase) but plan to do so.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v5-0001-Synchronize-logical-replication-slots-from-primar.patch

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

16 June 2023, 09:56:05

On Mon, Apr 17, 2023 at 7:37 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> Please find attached V5 (a rebase of V4 posted up-thread).
>
> In addition to the "rebasing" work, the TAP test adds a test about conflict handling (logical slot invalidation)
> relying on the work done in the "Minimal logical decoding on standby" patch series.
>
> I did not look more at the patch (than what's was needed for the rebase) but plan to do so.
>

Are you still planning to continue working on this? Some miscellaneous
comments while going through this patch are as follows?

1. Can you please try to explain the functionality of the overall
patch somewhere in the form of comments and or commit message?

2. It seems that the initially synchronized list of slots is only used
to launch a per-database worker to synchronize all the slots
corresponding to that database. If so, then why do we need to fetch
all the slot-related information via LIST_SLOTS command?

3. As mentioned in the initial email, I think it would be better to
replace LIST_SLOTS command with a SELECT query.

4. How the limit of sync_slot workers is decided? Can we document such
a piece of information? Do we need a new GUC to decide the number of
workers? Ideally, it would be better to avoid GUC, can we use any
existing logical replication workers related GUC?

5. Can we separate out the functionality related to standby_slot_names
in a separate patch, probably the first one? I think that will patch
easier to review.

6. In libpqrcv_list_slots(), two-phase related slot information is not
retrieved. Is there a reason for the same?

7.
+static void
+wait_for_standby_confirmation(XLogRecPtr commit_lsn)

Some comments atop this function would make it easier to review.

8.
+/*-------------------------------------------------------------------------
+ * slotsync.c
+ *    PostgreSQL worker for synchronizing slots to a standby from primary
+ *
+ * Copyright (c) 2016-2018, PostgreSQL Global Development Group
+ *

The copyright notice is out-of-date.

9. Why synchronize_one_slot() compares
MyReplicationSlot->data.restart_lsn with the value of
confirmed_flush_lsn passed to it? Also, why it does only for new slots
but not existing slots?

10. Can we somehow test if the restart_lsn is advanced properly after
sync? I think it is important to ensure that because otherwise after
standby's promotion, the subscriber can start syncing from the wrong
position.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

"Drouvot, Bertrand"

Date:

19 June 2023, 06:02:53

Hi,

On 6/16/23 11:56 AM, Amit Kapila wrote:
> On Mon, Apr 17, 2023 at 7:37 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
>>
>> Please find attached V5 (a rebase of V4 posted up-thread).
>>
>> In addition to the "rebasing" work, the TAP test adds a test about conflict handling (logical slot invalidation)
>> relying on the work done in the "Minimal logical decoding on standby" patch series.
>>
>> I did not look more at the patch (than what's was needed for the rebase) but plan to do so.
>>
> 
> Are you still planning to continue working on this? 

Yes, I think it would be great to have such a feature in core.

> Some miscellaneous
> comments while going through this patch are as follows?

Thanks! I'll look at them and will try to come back to you by
mid of next week.

Also I think we need to handle the case of invalidated replication slot(s): should
we drop/recreate it/them? (as the main goal is to have sync slot(s) on the standby).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

19 June 2023, 10:03:37

On Mon, Jun 19, 2023 at 11:34 AM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> Also I think we need to handle the case of invalidated replication slot(s): should
> we drop/recreate it/them? (as the main goal is to have sync slot(s) on the standby).
>

Do you intend to ask what happens to logical slots invalidated (due to
say max_slot_wal_keep_size) on publisher? I think those should be
invalidated on standby too. Another thought whether there is chance
that the slot on standby gets invalidated due to conflict (say
required rows removed on primary)? I think in such cases the slot on
primary/publisher should have been dropped/invalidated by that time.
BTW, does the patch handles drop of logical slots on standby when the
same slot is dropped on publisher/primary?

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

"Drouvot, Bertrand"

Date:

19 June 2023, 16:26:52

Hi,

On 6/19/23 12:03 PM, Amit Kapila wrote:
> On Mon, Jun 19, 2023 at 11:34 AM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
>>
>> Also I think we need to handle the case of invalidated replication slot(s): should
>> we drop/recreate it/them? (as the main goal is to have sync slot(s) on the standby).
>>
> 
> Do you intend to ask what happens to logical slots invalidated (due to
> say max_slot_wal_keep_size) on publisher? I think those should be
> invalidated on standby too. 

Agree that it should behave that way.

> Another thought whether there is chance
> that the slot on standby gets invalidated due to conflict (say
> required rows removed on primary)?

That's the scenario I had in mind when asking the question above.

> I think in such cases the slot on
> primary/publisher should have been dropped/invalidated by that time.

I don't think so.

For example, such a scenario could occur:

- there is no physical slot between the standby and the primary
- the standby is shut down
- logical decoding on the primary is moving forward and now there is vacuum
operations that will conflict on the standby
- the standby starts and reports the logical slot being invalidated (while it is
not on the primary)

In such a case (slot valid on the primary but invalidated on the standby) then I think we
could drop and recreate the invalidated slot on the standby.

> BTW, does the patch handles drop of logical slots on standby when the
> same slot is dropped on publisher/primary?
> 

from what I've seen, yes it looks like it behaves that way (will look closer).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

20 June 2023, 10:22:17

On Mon, Jun 19, 2023 at 9:56 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> On 6/19/23 12:03 PM, Amit Kapila wrote:
> > On Mon, Jun 19, 2023 at 11:34 AM Drouvot, Bertrand
> > <bertranddrouvot.pg@gmail.com> wrote:
> >>
> >> Also I think we need to handle the case of invalidated replication slot(s): should
> >> we drop/recreate it/them? (as the main goal is to have sync slot(s) on the standby).
> >>
> >
> > Do you intend to ask what happens to logical slots invalidated (due to
> > say max_slot_wal_keep_size) on publisher? I think those should be
> > invalidated on standby too.
>
> Agree that it should behave that way.
>
> > Another thought whether there is chance
> > that the slot on standby gets invalidated due to conflict (say
> > required rows removed on primary)?
>
> That's the scenario I had in mind when asking the question above.
>
> > I think in such cases the slot on
> > primary/publisher should have been dropped/invalidated by that time.
>
> I don't think so.
>
> For example, such a scenario could occur:
>
> - there is no physical slot between the standby and the primary
> - the standby is shut down
> - logical decoding on the primary is moving forward and now there is vacuum
> operations that will conflict on the standby
> - the standby starts and reports the logical slot being invalidated (while it is
> not on the primary)
>
> In such a case (slot valid on the primary but invalidated on the standby) then I think we
> could drop and recreate the invalidated slot on the standby.
>

Will it be safe? Because after recreating the slot, it will reserve
the new WAL location and build the snapshot based on that which might
miss some important information in the snapshot. For example, to
update the slot's position with new information from the primary, the
patch uses pg_logical_replication_slot_advance() which means it will
process all records and update the snapshot via
DecodeCommit->SnapBuildCommitTxn().

The other related thing is that do we somehow need to ensure that WAL
is replayed on standby before moving the slot's position to the target
location received from the primary?

> > BTW, does the patch handles drop of logical slots on standby when the
> > same slot is dropped on publisher/primary?
> >
>
> from what I've seen, yes it looks like it behaves that way (will look closer).
>

Okay, I have asked because I don't see a call to ReplicationSlotDrop()
in the patch.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

"Drouvot, Bertrand"

Date:

26 June 2023, 05:45:28

Hi,

On 6/20/23 12:22 PM, Amit Kapila wrote:
> On Mon, Jun 19, 2023 at 9:56 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:

>> In such a case (slot valid on the primary but invalidated on the standby) then I think we
>> could drop and recreate the invalidated slot on the standby.
>>
> 
> Will it be safe? Because after recreating the slot, it will reserve
> the new WAL location and build the snapshot based on that which might
> miss some important information in the snapshot. For example, to
> update the slot's position with new information from the primary, the
> patch uses pg_logical_replication_slot_advance() which means it will
> process all records and update the snapshot via
> DecodeCommit->SnapBuildCommitTxn().

Your concern is that the slot could have been consumed on the standby?

I mean, if we suppose the "synchronized" slot can't be consumed on the standby then
drop/recreate such an invalidated slot would be ok?

Asking, because I'm not sure we should allow consumption of a "synchronized" slot
until the standby gets promoted.

When the patch has been initially proposed, logical decoding from a standby
was not implemented yet.

> The other related thing is that do we somehow need to ensure that WAL
> is replayed on standby before moving the slot's position to the target
> location received from the primary?

Yeah, will check if this is currently done that way in the patch proposal.

>>> BTW, does the patch handles drop of logical slots on standby when the
>>> same slot is dropped on publisher/primary?
>>>
>>
>> from what I've seen, yes it looks like it behaves that way (will look closer).
>>
> 
> Okay, I have asked because I don't see a call to ReplicationSlotDrop()
> in the patch.
> 

Right. I'd need to look closer to understand how it works (for the moment
the "only" thing I've done was the re-base shared up-thread).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

26 June 2023, 10:34:01

On Mon, Jun 26, 2023 at 11:15 AM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> On 6/20/23 12:22 PM, Amit Kapila wrote:
> > On Mon, Jun 19, 2023 at 9:56 PM Drouvot, Bertrand
> > <bertranddrouvot.pg@gmail.com> wrote:
>
> >> In such a case (slot valid on the primary but invalidated on the standby) then I think we
> >> could drop and recreate the invalidated slot on the standby.
> >>
> >
> > Will it be safe? Because after recreating the slot, it will reserve
> > the new WAL location and build the snapshot based on that which might
> > miss some important information in the snapshot. For example, to
> > update the slot's position with new information from the primary, the
> > patch uses pg_logical_replication_slot_advance() which means it will
> > process all records and update the snapshot via
> > DecodeCommit->SnapBuildCommitTxn().
>
> Your concern is that the slot could have been consumed on the standby?
>
> I mean, if we suppose the "synchronized" slot can't be consumed on the standby then
> drop/recreate such an invalidated slot would be ok?
>

That also may not be sufficient because as soon as the slot is
invalidated/dropped, the required WAL could be removed on standby.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

"Drouvot, Bertrand"

Date:

28 June 2023, 06:49:50

Hi,

On 6/26/23 12:34 PM, Amit Kapila wrote:
> On Mon, Jun 26, 2023 at 11:15 AM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
>>
>> On 6/20/23 12:22 PM, Amit Kapila wrote:
>>> On Mon, Jun 19, 2023 at 9:56 PM Drouvot, Bertrand
>>> <bertranddrouvot.pg@gmail.com> wrote:
>>
>>>> In such a case (slot valid on the primary but invalidated on the standby) then I think we
>>>> could drop and recreate the invalidated slot on the standby.
>>>>
>>>
>>> Will it be safe? Because after recreating the slot, it will reserve
>>> the new WAL location and build the snapshot based on that which might
>>> miss some important information in the snapshot. For example, to
>>> update the slot's position with new information from the primary, the
>>> patch uses pg_logical_replication_slot_advance() which means it will
>>> process all records and update the snapshot via
>>> DecodeCommit->SnapBuildCommitTxn().
>>
>> Your concern is that the slot could have been consumed on the standby?
>>
>> I mean, if we suppose the "synchronized" slot can't be consumed on the standby then
>> drop/recreate such an invalidated slot would be ok?
>>
> 
> That also may not be sufficient because as soon as the slot is
> invalidated/dropped, the required WAL could be removed on standby.
> 

Yeah, I think once the slot is dropped we just have to wait for the slot to
be re-created on the standby according to the new synchronize_slot_names GUC.

Assuming the initial slot "creation" on the standby (coming from the synchronize_slot_names usage)
is working "correctly" then it should also work "correctly" once the slot is dropped.

If we agree that a synchronized slot can not/should not be consumed (will implement this behavior) then
I think the proposed scenario above should make sense, do you agree?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

From

"Hayato Kuroda (Fujitsu)"

Date:

29 June 2023, 10:22:06

Dear Drouvot,

Hi, I'm also interested in the feature. Followings are my high-level comments.
I did not mention some detailed notations because this patch is not at the stage.
And very sorry that I could not follow all of this discussions.

1. I thought that we should not reuse logical replication launcher for another purpose.
The background worker should have only one task. I wanted to ask opinions some other people...
2. I want to confirm the reason why new replication command is added. IIUC the
launcher connects to primary by using primary_conninfo connection string, but
it establishes the physical replication connection so that any SQL cannot be executed.
Is it right? Another approach not to use is to specify the target database via
GUC, whereas not smart. How do you think?
3. You chose the per-db worker approach, however, it is difficult to extend the
feature to support physical slots. This may be problematic. Was there any
reasons for that? I doubted ReplicationSlotCreate() or advance functions might
not be used from other databases and these may be reasons, but not sure.
If these operations can do without connecting to specific database, I think
the architecture can be changed.
4. Currently the launcher establishes the connection every time. Isn't it better
to reuse the same one instead?

Following comments are assumed the configuration, maybe the straightfoward:

primary->standby
|->subscriber

5. After constructing the system, I dropped the subscription on the subscriber.
In this case the logical slot on primary was removed, but that was not replicated
to standby server. Did you support the workload or not?

```
$ psql -U postgres -p $port_sub -c "DROP SUBSCRIPTION sub"
NOTICE: dropped replication slot "sub" on publisher
DROP SUBSCRIPTION

```

6. Current approach may delay the startpoint of sync.

Assuming that physical replication system is created first, and then the
subscriber connects to the publisher node. In this case the launcher connects to
primary earlier than the apply worker, and reads the slot. At that time there are
no slots on primary, so launcher disconnects from primary and waits a time period (up to 3min).
Even if the apply worker creates the slot on publisher, but the launcher on standby
cannot notice that. The synchronization may start 3 min later.

I'm not sure how to fix or it could be acceptable. Thought?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

29 June 2023, 10:36:39

On Wed, Jun 28, 2023 at 12:19 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> On 6/26/23 12:34 PM, Amit Kapila wrote:
> > On Mon, Jun 26, 2023 at 11:15 AM Drouvot, Bertrand
> > <bertranddrouvot.pg@gmail.com> wrote:
> >>
> >> On 6/20/23 12:22 PM, Amit Kapila wrote:
> >>> On Mon, Jun 19, 2023 at 9:56 PM Drouvot, Bertrand
> >>> <bertranddrouvot.pg@gmail.com> wrote:
> >>
> >>>> In such a case (slot valid on the primary but invalidated on the standby) then I think we
> >>>> could drop and recreate the invalidated slot on the standby.
> >>>>
> >>>
> >>> Will it be safe? Because after recreating the slot, it will reserve
> >>> the new WAL location and build the snapshot based on that which might
> >>> miss some important information in the snapshot. For example, to
> >>> update the slot's position with new information from the primary, the
> >>> patch uses pg_logical_replication_slot_advance() which means it will
> >>> process all records and update the snapshot via
> >>> DecodeCommit->SnapBuildCommitTxn().
> >>
> >> Your concern is that the slot could have been consumed on the standby?
> >>
> >> I mean, if we suppose the "synchronized" slot can't be consumed on the standby then
> >> drop/recreate such an invalidated slot would be ok?
> >>
> >
> > That also may not be sufficient because as soon as the slot is
> > invalidated/dropped, the required WAL could be removed on standby.
> >
>
> Yeah, I think once the slot is dropped we just have to wait for the slot to
> be re-created on the standby according to the new synchronize_slot_names GUC.
>
> Assuming the initial slot "creation" on the standby (coming from the synchronize_slot_names usage)
> is working "correctly" then it should also work "correctly" once the slot is dropped.
>

I also think so.

> If we agree that a synchronized slot can not/should not be consumed (will implement this behavior) then
> I think the proposed scenario above should make sense, do you agree?
>

Yeah, I also can't think of a use case for this. So, we can probably
disallow it and document the same. I guess if we came across a use
case for this, we can rethink allowing to consume the changes from
synchronized slots.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

29 June 2023, 10:45:35

On Thu, Jun 29, 2023 at 3:52 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Drouvot,
>
> Hi, I'm also interested in the feature. Followings are my high-level comments.
> I did not mention some detailed notations because this patch is not at the stage.
> And very sorry that I could not follow all of this discussions.
>
> 1. I thought that we should not reuse logical replication launcher for another purpose.
>    The background worker should have only one task. I wanted to ask opinions some other people...
>

IIUC, the launcher will launch the sync slot workers corresponding to
slots that need sync on standby and apply workers for active
subscriptions on primary (which will be a subscriber in this context).
If this is correct, then do you expect to launch a separate kind of
standby launcher for sync slots?

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

"Drouvot, Bertrand"

Date:

30 June 2023, 05:55:51

Hi,

On 6/29/23 12:36 PM, Amit Kapila wrote:
> On Wed, Jun 28, 2023 at 12:19 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
>> Yeah, I think once the slot is dropped we just have to wait for the slot to
>> be re-created on the standby according to the new synchronize_slot_names GUC.
>>
>> Assuming the initial slot "creation" on the standby (coming from the synchronize_slot_names usage)
>> is working "correctly" then it should also work "correctly" once the slot is dropped.
>>
> 
> I also think so.
> 
>> If we agree that a synchronized slot can not/should not be consumed (will implement this behavior) then
>> I think the proposed scenario above should make sense, do you agree?
>>
> 
> Yeah, I also can't think of a use case for this. So, we can probably
> disallow it and document the same. I guess if we came across a use
> case for this, we can rethink allowing to consume the changes from
> synchronized slots.

Yeah agree, I'll work on a new version that deals with invalidated slot that way and
that ensures that a synchronized slot can't be consumed (until the standby gets promoted).
  
Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

"Drouvot, Bertrand"

Date:

30 June 2023, 06:09:40

Hi Kuroda-san,

On 6/29/23 12:22 PM, Hayato Kuroda (Fujitsu) wrote:
> Dear Drouvot,
> 
> Hi, I'm also interested in the feature. Followings are my high-level comments.
> I did not mention some detailed notations because this patch is not at the stage.
> And very sorry that I could not follow all of this discussions.
> 

Thanks for looking at it and your feedback!

All I've done so far is to provide a re-based version in April of the existing patch.

I'll have a closer look at the code, at your feedback and Amit's one while working on the new version that will:

- take care of slot invalidation
- ensure that synchronized slot cant' be consumed until the standby gets promoted

as discussed up-thread.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

01 July 2023, 01:42:48

On Thu, Jun 29, 2023 at 3:52 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> 2. I want to confirm the reason why new replication command is added.
>

Are you referring LIST_SLOTS command? If so, I don't think that is
required and instead, we can use a query to fetch the required
information.

> IIUC the
>    launcher connects to primary by using primary_conninfo connection string, but
>    it establishes the physical replication connection so that any SQL cannot be executed.
>    Is it right? Another approach not to use is to specify the target database via
>    GUC, whereas not smart. How do you think?
> 3. You chose the per-db worker approach, however, it is difficult to extend the
>    feature to support physical slots. This may be problematic. Was there any
>    reasons for that? I doubted ReplicationSlotCreate() or advance functions might
>    not be used from other databases and these may be reasons, but not sure.
>    If these operations can do without connecting to specific database, I think
>    the architecture can be changed.
>

I think this point needs some investigation but do we want just one
worker that syncs all the slots? That may lead to lag in keeping the
slots up-to-date. We probably need some tests.

> 4. Currently the launcher establishes the connection every time. Isn't it better
>    to reuse the same one instead?
>

I feel it is not the launcher but a separate sync slot worker that
establishes the connection. It is not clear to me what exactly you
have in mind. Can you please explain a bit more?

> Following comments are assumed the configuration, maybe the straightfoward:
>
> primary->standby
>    |->subscriber
>
> 5. After constructing the system, I dropped the subscription on the subscriber.
>    In this case the logical slot on primary was removed, but that was not replicated
>    to standby server. Did you support the workload or not?
>

This should work.

>
> 6. Current approach may delay the startpoint of sync.
>
> Assuming that physical replication system is created first, and then the
> subscriber connects to the publisher node. In this case the launcher connects to
> primary earlier than the apply worker, and reads the slot. At that time there are
> no slots on primary, so launcher disconnects from primary and waits a time period (up to 3min).
> Even if the apply worker creates the slot on publisher, but the launcher on standby
> cannot notice that. The synchronization may start 3 min later.
>

I feel this should be based on some GUC like
'wal_retrieve_retry_interval' which we are already using in the
launcher or probably a new one if that doesn't seem to match.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Bharath Rupireddy

Date:

09 July 2023, 07:30:00

On Fri, Jun 16, 2023 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Apr 17, 2023 at 7:37 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Please find attached V5 (a rebase of V4 posted up-thread).
> >
> > In addition to the "rebasing" work, the TAP test adds a test about conflict handling (logical slot invalidation)
> > relying on the work done in the "Minimal logical decoding on standby" patch series.
> >
> > I did not look more at the patch (than what's was needed for the rebase) but plan to do so.
> >
>
> Are you still planning to continue working on this? Some miscellaneous
> comments while going through this patch are as follows?
>
> 1. Can you please try to explain the functionality of the overall
> patch somewhere in the form of comments and or commit message?

IIUC, there are 2 core ideas of the feature:

1) It will never let the logical replication subscribers go ahead of
physical replication standbys specified in standby_slot_names. It
implements this by delaying decoding of commit records on the
walsenders corresponding to logical replication subscribers on the
primary until all the specified standbys confirm receiving the commit
LSN.

2) The physical replication standbys will synchronize data of the
specified logical replication slots (in synchronize_slot_names) from
the primary, creating the logical replication slots if necessary.
Since the logical replication subscribers will never go out of
physical replication standbys, the standbys can safely synchronize the
slots and keep the data necessary for subscribers to connect to it and
work seamlessly even after a failover.

If my understanding is right, I have few thoughts here:

1. All the logical walsenders are delayed on the primary - per
wait_for_standby_confirmation() despite the user being interested in
only a few of them via synchronize_slot_names. Shouldn't the delay be
for just the slots specified in synchronize_slot_names?
2. I think we can split the patch like this - 0001 can be the logical
walsenders delaying decoding on the primary unless standbys confirm,
0002 standby synchronizing the logical slots.
3. I think we need to change the GUC standby_slot_names to better
reflect what it is used for - wait_for_replication_slot_names or
wait_for_
4. It allows specifying logical slots in standby_slot_names, meaning,
it can disallow logical slots getting ahead of other logical slots
specified in standby_slot_names. Should we allow this case with the
thinking that if there's anyone using logical replication for failover
(well, will anybody do that in production?).
5. Similar to above, it allows specifying physical slots in
synchronize_slot_names. Should we disallow?

I'm attaching the v6 patch, a rebased version of v5.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v6-0001-Synchronize-logical-replication-slots-from-primar.patch

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

10 July 2023, 03:36:22

On Sun, Jul 9, 2023 at 1:01 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Jun 16, 2023 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > 1. Can you please try to explain the functionality of the overall
> > patch somewhere in the form of comments and or commit message?
>
> IIUC, there are 2 core ideas of the feature:
>
> 1) It will never let the logical replication subscribers go ahead of
> physical replication standbys specified in standby_slot_names. It
> implements this by delaying decoding of commit records on the
> walsenders corresponding to logical replication subscribers on the
> primary until all the specified standbys confirm receiving the commit
> LSN.
>
> 2) The physical replication standbys will synchronize data of the
> specified logical replication slots (in synchronize_slot_names) from
> the primary, creating the logical replication slots if necessary.
> Since the logical replication subscribers will never go out of
> physical replication standbys, the standbys can safely synchronize the
> slots and keep the data necessary for subscribers to connect to it and
> work seamlessly even after a failover.
>
> If my understanding is right,
>

This matches my understanding as well.

> I have few thoughts here:
>
> 1. All the logical walsenders are delayed on the primary - per
> wait_for_standby_confirmation() despite the user being interested in
> only a few of them via synchronize_slot_names. Shouldn't the delay be
> for just the slots specified in synchronize_slot_names?
> 2. I think we can split the patch like this - 0001 can be the logical
> walsenders delaying decoding on the primary unless standbys confirm,
> 0002 standby synchronizing the logical slots.
>

Agreed with the above two points.

> 3. I think we need to change the GUC standby_slot_names to better
> reflect what it is used for - wait_for_replication_slot_names or
> wait_for_
>

I feel at this stage we can focus on getting the design and
implementation correct. We can improve GUC names later once we are
confident that the functionality is correct.

> 4. It allows specifying logical slots in standby_slot_names, meaning,
> it can disallow logical slots getting ahead of other logical slots
> specified in standby_slot_names. Should we allow this case with the
> thinking that if there's anyone using logical replication for failover
> (well, will anybody do that in production?).
>

I think on the contrary we should prohibit this case. We can always
extend this functionality later.

> 5. Similar to above, it allows specifying physical slots in
> synchronize_slot_names. Should we disallow?
>

We should prohibit that as well.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Peter Eisentraut

Date:

12 July 2023, 11:48:56

On 14.04.23 15:22, Drouvot, Bertrand wrote:
> Now that the "Minimal logical decoding on standby" patch series 
> (mentioned up-thread) has been
> committed, I think we can resume working on this one ("Synchronizing 
> slots from primary to standby").

Maybe you have seen this extension that was released a few months ago: 
https://github.com/EnterpriseDB/pg_failover_slots .  This contains the 
same functionality packaged as an extension.  Maybe this can give some 
ideas about how this should behave and what options to provide etc.

Note that pg_failover_slots doesn't use logical decoding on standby, 
because that would be too slow in practice.  Earlier in this thread we 
had some discussion about which of the two approaches was preferred. 
Anyway, that's what's out there.

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

20 July 2023, 11:34:44

On Fri, Jun 16, 2023 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Apr 17, 2023 at 7:37 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
> >

> 3. As mentioned in the initial email, I think it would be better to
> replace LIST_SLOTS command with a SELECT query.
>

I had a look at this thread. I am interested to work on this and can
spend some time addressing the comments given here.

I tried to replace LIST_SLOTS command with a SELECT query. Attached
rebased patch and PoC patch for LIST_SLOTS removal. For LIST_SLOTs cmd
removal, below are the points where more analysis is needed.

1) I could not use the exposed libpqwalreceiver's functions
walrcv_exec/libpqrcv_exec in LogicalRepLauncher to run select query
instead of LIST_SLOTS cmd. This is because libpqrcv_exec() needs
database connection but since in LogicalReplauncher, we do not have
any (MyDatabseId is not set), so the API gives an error. Thus to make
it work for the time-being, I used 'libpqrcv_PQexec' which is not
dependent upon database connection. But since it is not exposed "yet"
to other layers, I temporarily added the new code to
libpqwalreceiver.c itself. In fact I reused the existing function
wrapper libpqrcv_list_slots and changed the functionality to get info
using select query rather than list_slots.

2) While using connect API walrcv_connect/libpqrcv_connect(), we need
to tell it whether it is for logical or physical replication. In the
existing patch, where we were using LIST_SLOTS cmd, we have this
connection made with logical=false. But now since we need to run
select query to get the same info, using connection with logical=false
gives error on primary while executing select query. "ERROR:  cannot
execute SQL commands in WAL sender for physical replication".
And thus in ApplyLauncherStartSlotSync(), I have changed connect API
to use logical=true for the time being.  I noticed that in the
existing patch, it was using logical=false in
ApplyLauncherStartSlotSync() while logical=true in
synchronize_slots(). Possibly due to the same fact that logical=false
connection will not allow synchronize_slots() to run select query on
primary while it worked for ApplyLauncherStartSlotSync() as it was
running list_slots cmd instead of select query.

I am exploring further on these points to figure out which one is the
correct way to deal with these. Meanwhile posting this WIP patch for
early feedback. I will try addressing other comments as well in next
versions.

thanks
Shveta

On Fri, Jul 21, 2023 at 5:16 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> Thanks Bharat for letting us know. It is okay to split the patch, it
> may definitely help to understand the modules better but shall we take
> a step back and try to reevaluate the design first before moving to
> other tasks?

Agree that design comes first. FWIW, I'm attaching the v9 patch set
that I have with me. It can't be a perfect patch set unless the design
is finalized.

> I analyzed more on the issues stated in [1] for replacing LIST_SLOTS
> with SELECT query. On rethinking, it might not be a good idea to
> replace this cmd with SELECT in Launcher code-path

I think there are open fundamental design aspects, before optimizing
LIST_SLOTS, see below. I'm sure we can come back to this later.

> Secondly, I was thinking if the design proposed in the patch is the
> best one. No doubt, it is the most simplistic design and thus may
> .......... Any feedback is appreciated.

Here are my thoughts about this feature:

Current design:

1. On primary, never allow walsenders associated with logical
replication slots to go ahead of physical standbys that are candidates
for future primary after failover. This enables subscribers to connect
to new primary after failover.
2. On all candidate standbys, periodically sync logical slots from
primary (creating the slots if necessary) with one slot sync worker
per logical slot.

Important considerations:

1. Does this design guarantee the row versions required by subscribers
aren't removed on candidate standbys as raised here -
https://www.postgresql.org/message-id/20220218222319.yozkbhren7vkjbi5%40alap3.anarazel.de?

It seems safe with logical decoding on standbys feature. Also, a
test-case from upthread is already in patch sets (in v9 too)
https://www.postgresql.org/message-id/CAAaqYe9FdKODa1a9n%3Dqj%2Bw3NiB9gkwvhRHhcJNginuYYRCnLrg%40mail.gmail.com.
However, we need to verify the use cases extensively.

2. All candidate standbys will start one slot sync worker per logical
slot which might not be scalable. Is having one (or a few more - not
necessarily one for each logical slot) worker for all logical slots
enough?

It seems safe to have one worker for all logical slots - it's not a
problem even if the worker takes a bit of time to get to sync a
logical slot on a candidate standby, because the standby is ensured to
retain all the WAL and row versions required to decode and send to the
logical slots.

3. Indefinite waiting of logical walsenders for candidate standbys may
not be a good idea. Is having a timeout for logical walsenders a good
idea?

A problem with timeout is that it can make logical slots unusable
after failover.

4. All candidate standbys retain WAL required by logical slots. Amount
of WAL retained may be huge if there's a replication lag with logical
replication subscribers.

This turns out to be a typical problem with replication, so there's
nothing much this feature can do to prevent WAL file accumulation
except for asking one to monitor replication lag and WAL file growth.

5. Logical subscribers replication lag will depend on all candidate
standbys replication lag. If candidate standbys are too far from
primary and logical subscribers are too close, still logical
subscribers will have replication lag. There's nothing much this
feature can do to prevent this except for calling it out in
documentation.

6. This feature might need to prevent the GUCs from deviating on
primary and the candidate standbys - there's no point in syncing a
logical slot on candidate standbys if logical walsender related to it
on primary isn't keeping itself behind all the candidate standbys. If
preventing this from happening proves to be tough, calling it out in
documentation to keep GUCs the same is a good start.

7. There are some important review comments provided upthread as far
as this design and patches are concerned -
https://www.postgresql.org/message-id/20220207204557.74mgbhowydjco4mh%40alap3.anarazel.de
and https://www.postgresql.org/message-id/20220207203222.22aktwxrt3fcllru%40alap3.anarazel.de.
I'm sure we can come to these once the design is clear.

Please feel free to add the list if I'm missing anything.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

On Thu, Jul 27, 2023 at 12:13 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Jul 27, 2023 at 10:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jul 26, 2023 at 10:31 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Mon, Jul 24, 2023 at 9:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, Jul 24, 2023 at 8:03 AM Bharath Rupireddy
> > > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > > >
> > > > > Is having one (or a few more - not
> > > > > necessarily one for each logical slot) worker for all logical slots
> > > > > enough?
> > > > >
> > > >
> > > > I guess for a large number of slots the is a possibility of a large
> > > > gap in syncing the slots which probably means we need to retain
> > > > corresponding WAL for a much longer time on the primary. If we can
> > > > prove that the gap won't be large enough to matter then this would be
> > > > probably worth considering otherwise, I think we should find a way to
> > > > scale the number of workers to avoid the large gap.
> > > >
> > >
> > > How about this:
> > >
> > > 1) On standby, spawn 1 worker per database in the start (as it is
> > > doing currently).
> > >
> > > 2) Maintain statistics on activity against each primary's database on
> > > standby by any means. Could be by maintaining 'last_synced_time' and
> > > 'last_activity_seen time'.  The last_synced_time is updated every time
> > > we sync/recheck slots for that particular database. The
> > > 'last_activity_seen_time' changes only if we get any slot on that
> > > database where actually confirmed_flush or say restart_lsn has changed
> > > from what was maintained already.
> > >
> > > 3) If at any moment, we find that 'last_synced_time' -
> > > 'last_activity_seen' goes beyond a threshold, that means that DB is
> > > not active currently. Add it to list of inactive DB
> > >
> >
> > I think we should also increase the next_sync_time if in current sync,
> > there is no update.
>
> +1
>
> >
> > > 4) Launcher on the other hand is always checking if it needs to spawn
> > > any other extra worker for any new DB. It will additionally check if
> > > number of inactive databases (maintained on standby) has gone higher
> > > (> some threshold), then it brings down the workers for those and
> > > starts a common worker which takes care of all such inactive databases
> > > (or merge all in 1), while workers for active databases remain as such
> > > (i.e. one per db). Each worker maintains the list of DBs which it is
> > > responsible for.
> > >
> > > 5) If in the list of these inactive databases, we again find any
> > > active database using the above logic, then the launcher will spawn a
> > > separate worker for that.
> > >
> >
> > I wonder if we anyway some sort of design like this because we
> > shouldn't allow to spawn as many workers as the number of databases.
> > There has to be some existing or new GUC like max_sync_slot_workers
> > which decided the number of workers.
> >
>
> Currently it does not have any such GUC for sync-slot workers. It
> mainly uses the logical-rep-worker framework for the sync-slot worker
> part and thus it relies on 'max_logical_replication_workers' GUC. Also
> it errors out if 'max_replication_slots' is set to zero. I think it is
> not the correct way of doing things for sync-slot. We can have a new
> GUC (max_sync_slot_workers) as you suggested and if the number of
> databases < max_sync_slot_workers, then we can start 1 worker per
> dbid, else divide the work equally among the max sync-workers
> possible. And for inactive database cases, we can increase the
> next_sync_time rather than starting a special worker to handle all the
> inactive databases.  Thoughts?
>

Attaching the PoC patch (0003) where attempts to implement the basic
infrastructure for the suggested design. Rebased the existing patches
(0001 and 0002) as well.

This patch adds a new GUC max_slot_sync_workers; the default and max
value is kept at 2 and 50 respectively for this PoC patch. Now the
replication launcher divides the work equally among these many
slot-sync workers. Let us say there are multiple slots on primary
belonging to 10 DBs and say new GUC on standby is set at default value
of 2, then each worker on standby will manage 5 dbs individually and
will keep on synching the slots for them. If a new DB is found by
replication launcher, it will assign this new db to the worker
handling the minimum number of dbs currently (or first worker in case
of equal count) and that worker will pick up the new db the next time
it tries to sync the slots.
I have kept the changes in separate patches (003) for ease of review.
Since this is just a PoC patch, many things are yet to be done
appropriately, will cover those in next versions.

thanks
Shveta

On Tue, Aug 1, 2023 at 4:52 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Jul 27, 2023 at 12:13 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Jul 27, 2023 at 10:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Jul 26, 2023 at 10:31 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > On Mon, Jul 24, 2023 at 9:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Mon, Jul 24, 2023 at 8:03 AM Bharath Rupireddy
> > > > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > > > >
> > > > > > Is having one (or a few more - not
> > > > > > necessarily one for each logical slot) worker for all logical slots
> > > > > > enough?
> > > > > >
> > > > >
> > > > > I guess for a large number of slots the is a possibility of a large
> > > > > gap in syncing the slots which probably means we need to retain
> > > > > corresponding WAL for a much longer time on the primary. If we can
> > > > > prove that the gap won't be large enough to matter then this would be
> > > > > probably worth considering otherwise, I think we should find a way to
> > > > > scale the number of workers to avoid the large gap.
> > > > >
> > > >
> > > > How about this:
> > > >
> > > > 1) On standby, spawn 1 worker per database in the start (as it is
> > > > doing currently).
> > > >
> > > > 2) Maintain statistics on activity against each primary's database on
> > > > standby by any means. Could be by maintaining 'last_synced_time' and
> > > > 'last_activity_seen time'.  The last_synced_time is updated every time
> > > > we sync/recheck slots for that particular database. The
> > > > 'last_activity_seen_time' changes only if we get any slot on that
> > > > database where actually confirmed_flush or say restart_lsn has changed
> > > > from what was maintained already.
> > > >
> > > > 3) If at any moment, we find that 'last_synced_time' -
> > > > 'last_activity_seen' goes beyond a threshold, that means that DB is
> > > > not active currently. Add it to list of inactive DB
> > > >
> > >
> > > I think we should also increase the next_sync_time if in current sync,
> > > there is no update.
> >
> > +1
> >
> > >
> > > > 4) Launcher on the other hand is always checking if it needs to spawn
> > > > any other extra worker for any new DB. It will additionally check if
> > > > number of inactive databases (maintained on standby) has gone higher
> > > > (> some threshold), then it brings down the workers for those and
> > > > starts a common worker which takes care of all such inactive databases
> > > > (or merge all in 1), while workers for active databases remain as such
> > > > (i.e. one per db). Each worker maintains the list of DBs which it is
> > > > responsible for.
> > > >
> > > > 5) If in the list of these inactive databases, we again find any
> > > > active database using the above logic, then the launcher will spawn a
> > > > separate worker for that.
> > > >
> > >
> > > I wonder if we anyway some sort of design like this because we
> > > shouldn't allow to spawn as many workers as the number of databases.
> > > There has to be some existing or new GUC like max_sync_slot_workers
> > > which decided the number of workers.
> > >
> >
> > Currently it does not have any such GUC for sync-slot workers. It
> > mainly uses the logical-rep-worker framework for the sync-slot worker
> > part and thus it relies on 'max_logical_replication_workers' GUC. Also
> > it errors out if 'max_replication_slots' is set to zero. I think it is
> > not the correct way of doing things for sync-slot. We can have a new
> > GUC (max_sync_slot_workers) as you suggested and if the number of
> > databases < max_sync_slot_workers, then we can start 1 worker per
> > dbid, else divide the work equally among the max sync-workers
> > possible. And for inactive database cases, we can increase the
> > next_sync_time rather than starting a special worker to handle all the
> > inactive databases.  Thoughts?
> >
>
> Attaching the PoC patch (0003) where attempts to implement the basic
> infrastructure for the suggested design. Rebased the existing patches
> (0001 and 0002) as well.
>
> This patch adds a new GUC max_slot_sync_workers; the default and max
> value is kept at 2 and 50 respectively for this PoC patch. Now the
> replication launcher divides the work equally among these many
> slot-sync workers. Let us say there are multiple slots on primary
> belonging to 10 DBs and say new GUC on standby is set at default value
> of 2, then each worker on standby will manage 5 dbs individually and
> will keep on synching the slots for them. If a new DB is found by
> replication launcher, it will assign this new db to the worker
> handling the minimum number of dbs currently (or first worker in case
> of equal count) and that worker will pick up the new db the next time
> it tries to sync the slots.
> I have kept the changes in separate patches (003) for ease of review.
> Since this is just a PoC patch, many things are yet to be done
> appropriately, will cover those in next versions.
>

Attaching new set of patches which attempt to implement below changes:

1) Logical Replication launcher now gets only the list of unique dbids
belonging to slots in 'synchronize_slot_names' instead of getting all
the slots-data. This has been implemented using the new command
LIST_DBID_FOR_LOGICAL_SLOTS.

2) The launcher assigns the DBs to sync slot workers. Each worker will
have its own dbids list. Since the upper limit of this dbid-count is
not known, it is now allocated using dsm. The launcher initially
allocates memory to hold 100 dbids for each worker. If this limit is
exhausted, it reallocates this memory with size incremented by 100
again and relaunches the worker. This re-launched worker will continue
to have the existing set of DBs which it was managing earlier plus the
new DB.

Both these changes are in patch v11_0002. The earlier patch v10_0003
is now merged to 0002 itself. More on standby-side design of this PoC
patch can be found in commit message of v11-0002

Thanks Ajin for working on 1.

thanks
Shveta

On Mon, Aug 14, 2023 at 3:22 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Aug 8, 2023 at 11:11 AM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On 8/8/23 7:01 AM, shveta malik wrote:
> > > On Mon, Aug 7, 2023 at 3:17 PM Drouvot, Bertrand
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> On 8/4/23 1:32 PM, shveta malik wrote:
> > >>> On Fri, Aug 4, 2023 at 2:44 PM Drouvot, Bertrand
> > >>> <bertranddrouvot.pg@gmail.com> wrote:
> > >>>> On 7/28/23 4:39 PM, Bharath Rupireddy wrote:
> > >>
> > >
> > > Agreed. That is why in v10,v11 patches, we have different infra for
> > > sync-slot worker i.e. it is not relying on "logical replication
> > > background worker" anymore.
> >
> > yeah saw that, looks like the right way to go to me.
> >
> > >> Maybe we should start some tests/benchmark with only one sync worker to get numbers
> > >> and start from there?
> > >
> > > Yes, we can do that performance testing to figure out the difference
> > > between the two modes. I will try to get some statistics on this.
> > >
> >
> > Great, thanks!
> >
>
> We (myself and Ajin) performed the tests to compute the lag in standby
> slots as compared to primary slots with different number of slot-sync
> workers configured.
>
> 3 DBs were created, each with 30 tables and each table having one
> logical-pub/sub configured. So this made a total of 90 logical
> replication slots to be synced. Then the workload was run for aprox 10
> mins. During this workload, at regular intervals, primary and standby
> slots' lsns were captured (from pg_replication_slots) and compared. At
> each capture, the intent was to know how much is each standby's slot
> lagging behind corresponding primary's slot by taking the distance
> between confirmed_flush_lsn of primary and standby slot. Then we took
> the average (integer value) of this distance over the span of 10 min
> workload and this is what we got:
>

I have attached the scripts for schema-setup, running workload and
capturing lag. Please go through Readme for details.


thanks
Shveta

Attachment

scripts.zip

Re: Synchronizing slots from primary to standby

From

Ajin Cherian

Date:

16 August 2023, 05:34:57

On Mon, Aug 14, 2023 at 8:38 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Aug 14, 2023 at 3:22 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Tue, Aug 8, 2023 at 11:11 AM Drouvot, Bertrand
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On 8/8/23 7:01 AM, shveta malik wrote:
> > > > On Mon, Aug 7, 2023 at 3:17 PM Drouvot, Bertrand
> > > > <bertranddrouvot.pg@gmail.com> wrote:
> > > >>
> > > >> Hi,
> > > >>
> > > >> On 8/4/23 1:32 PM, shveta malik wrote:
> > > >>> On Fri, Aug 4, 2023 at 2:44 PM Drouvot, Bertrand
> > > >>> <bertranddrouvot.pg@gmail.com> wrote:
> > > >>>> On 7/28/23 4:39 PM, Bharath Rupireddy wrote:
> > > >>
> > > >
> > > > Agreed. That is why in v10,v11 patches, we have different infra for
> > > > sync-slot worker i.e. it is not relying on "logical replication
> > > > background worker" anymore.
> > >
> > > yeah saw that, looks like the right way to go to me.
> > >
> > > >> Maybe we should start some tests/benchmark with only one sync worker to get numbers
> > > >> and start from there?
> > > >
> > > > Yes, we can do that performance testing to figure out the difference
> > > > between the two modes. I will try to get some statistics on this.
> > > >
> > >
> > > Great, thanks!
> > >
> >
> > We (myself and Ajin) performed the tests to compute the lag in standby
> > slots as compared to primary slots with different number of slot-sync
> > workers configured.
> >
> > 3 DBs were created, each with 30 tables and each table having one
> > logical-pub/sub configured. So this made a total of 90 logical
> > replication slots to be synced. Then the workload was run for aprox 10
> > mins. During this workload, at regular intervals, primary and standby
> > slots' lsns were captured (from pg_replication_slots) and compared. At
> > each capture, the intent was to know how much is each standby's slot
> > lagging behind corresponding primary's slot by taking the distance
> > between confirmed_flush_lsn of primary and standby slot. Then we took
> > the average (integer value) of this distance over the span of 10 min
> > workload and this is what we got:
> >
>
> I have attached the scripts for schema-setup, running workload and
> capturing lag. Please go through Readme for details.
>
>
I did some more tests for 10,20 and 40 slots to calculate the average
lsn distance
between slots, comparing 1 worker and 3 workers.

My results are as follows:

10 slots
1 worker: 5529.75527426 (average lsn distance between primary and
standby per slot)
3 worker: 2224.57589134

20 slots
1 worker: 9592.87234043
3 worker: 3194.62933333

40 slots
1 worker: 20566.0933333
3 worker: 7885.80952381

90 slots
1 worker: 36706.8405797
3 worker: 10236.6393162

regards,
Ajin Cherian
Fujitsu Australia

Attachment

Sync-slots average lsn.png

Re: Synchronizing slots from primary to standby

From

"Drouvot, Bertrand"

Date:

17 August 2023, 06:14:19

Hi,

On 8/14/23 11:52 AM, shveta malik wrote:

> 
> We (myself and Ajin) performed the tests to compute the lag in standby
> slots as compared to primary slots with different number of slot-sync
> workers configured.
> 

Thanks!

> 3 DBs were created, each with 30 tables and each table having one
> logical-pub/sub configured. So this made a total of 90 logical
> replication slots to be synced. Then the workload was run for aprox 10
> mins. During this workload, at regular intervals, primary and standby
> slots' lsns were captured (from pg_replication_slots) and compared. At
> each capture, the intent was to know how much is each standby's slot
> lagging behind corresponding primary's slot by taking the distance
> between confirmed_flush_lsn of primary and standby slot. Then we took
> the average (integer value) of this distance over the span of 10 min
> workload 

Thanks for the explanations, make sense to me.

> and this is what we got:
> 
> With max_slot_sync_workers=1, average-lag =  42290.3563
> With max_slot_sync_workers=2, average-lag =  24585.1421
> With max_slot_sync_workers=3, average-lag =  14964.9215
> 
> This shows that more workers have better chances to keep logical
> replication slots in sync for this case.
> 

Agree.

> Another statistics if it interests you is, we ran a frequency test as
> well (this by changing code, unit test sort of) to figure out the
> 'total number of times synchronization done' with different number of
> sync-slots workers configured. Same 3 DBs setup with each DB having 30
> logical replication slots. With 'max_slot_sync_workers' set at 1, 2
> and 3; total number of times synchronization done was 15874, 20205 and
> 23414 respectively. Note: this is not on the same machine where we
> captured lsn-gap data, it is on  a little less efficient machine but
> gives almost the same picture
> 
> Next we are planning to capture this data for a lesser number of slots
> like 10,30,50 etc. It may happen that the benefit of multi-workers
> over single workers in such cases could be less, but let's have the
> data to verify that.
> 

Thanks a lot for those numbers and for the testing!

Do you think it would make sense to also get the number of using
the pg_failover_slots module? (and compare the pg_failover_slots numbers with the
"one worker" case here). Idea is to check if the patch does introduce
some overhead as compare to pg_failover_slots.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

17 August 2023, 06:25:44

On Thu, Aug 17, 2023 at 11:44 AM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On 8/14/23 11:52 AM, shveta malik wrote:
>
> >
> > We (myself and Ajin) performed the tests to compute the lag in standby
> > slots as compared to primary slots with different number of slot-sync
> > workers configured.
> >
>
> Thanks!
>
> > 3 DBs were created, each with 30 tables and each table having one
> > logical-pub/sub configured. So this made a total of 90 logical
> > replication slots to be synced. Then the workload was run for aprox 10
> > mins. During this workload, at regular intervals, primary and standby
> > slots' lsns were captured (from pg_replication_slots) and compared. At
> > each capture, the intent was to know how much is each standby's slot
> > lagging behind corresponding primary's slot by taking the distance
> > between confirmed_flush_lsn of primary and standby slot. Then we took
> > the average (integer value) of this distance over the span of 10 min
> > workload
>
> Thanks for the explanations, make sense to me.
>
> > and this is what we got:
> >
> > With max_slot_sync_workers=1, average-lag =  42290.3563
> > With max_slot_sync_workers=2, average-lag =  24585.1421
> > With max_slot_sync_workers=3, average-lag =  14964.9215
> >
> > This shows that more workers have better chances to keep logical
> > replication slots in sync for this case.
> >
>
> Agree.
>
> > Another statistics if it interests you is, we ran a frequency test as
> > well (this by changing code, unit test sort of) to figure out the
> > 'total number of times synchronization done' with different number of
> > sync-slots workers configured. Same 3 DBs setup with each DB having 30
> > logical replication slots. With 'max_slot_sync_workers' set at 1, 2
> > and 3; total number of times synchronization done was 15874, 20205 and
> > 23414 respectively. Note: this is not on the same machine where we
> > captured lsn-gap data, it is on  a little less efficient machine but
> > gives almost the same picture
> >
> > Next we are planning to capture this data for a lesser number of slots
> > like 10,30,50 etc. It may happen that the benefit of multi-workers
> > over single workers in such cases could be less, but let's have the
> > data to verify that.
> >
>
> Thanks a lot for those numbers and for the testing!
>
> Do you think it would make sense to also get the number of using
> the pg_failover_slots module? (and compare the pg_failover_slots numbers with the
> "one worker" case here). Idea is to check if the patch does introduce
> some overhead as compare to pg_failover_slots.
>

Yes, definitely. We will work on that and share the numbers soon.

thanks
Shveta

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

17 August 2023, 10:39:54

On Thu, Aug 17, 2023 at 11:55 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Aug 17, 2023 at 11:44 AM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On 8/14/23 11:52 AM, shveta malik wrote:
> >
> > >
> > > We (myself and Ajin) performed the tests to compute the lag in standby
> > > slots as compared to primary slots with different number of slot-sync
> > > workers configured.
> > >
> >
> > Thanks!
> >
> > > 3 DBs were created, each with 30 tables and each table having one
> > > logical-pub/sub configured. So this made a total of 90 logical
> > > replication slots to be synced. Then the workload was run for aprox 10
> > > mins. During this workload, at regular intervals, primary and standby
> > > slots' lsns were captured (from pg_replication_slots) and compared. At
> > > each capture, the intent was to know how much is each standby's slot
> > > lagging behind corresponding primary's slot by taking the distance
> > > between confirmed_flush_lsn of primary and standby slot. Then we took
> > > the average (integer value) of this distance over the span of 10 min
> > > workload
> >
> > Thanks for the explanations, make sense to me.
> >
> > > and this is what we got:
> > >
> > > With max_slot_sync_workers=1, average-lag =  42290.3563
> > > With max_slot_sync_workers=2, average-lag =  24585.1421
> > > With max_slot_sync_workers=3, average-lag =  14964.9215
> > >
> > > This shows that more workers have better chances to keep logical
> > > replication slots in sync for this case.
> > >
> >
> > Agree.
> >
> > > Another statistics if it interests you is, we ran a frequency test as
> > > well (this by changing code, unit test sort of) to figure out the
> > > 'total number of times synchronization done' with different number of
> > > sync-slots workers configured. Same 3 DBs setup with each DB having 30
> > > logical replication slots. With 'max_slot_sync_workers' set at 1, 2
> > > and 3; total number of times synchronization done was 15874, 20205 and
> > > 23414 respectively. Note: this is not on the same machine where we
> > > captured lsn-gap data, it is on  a little less efficient machine but
> > > gives almost the same picture
> > >
> > > Next we are planning to capture this data for a lesser number of slots
> > > like 10,30,50 etc. It may happen that the benefit of multi-workers
> > > over single workers in such cases could be less, but let's have the
> > > data to verify that.
> > >
> >
> > Thanks a lot for those numbers and for the testing!
> >
> > Do you think it would make sense to also get the number of using
> > the pg_failover_slots module? (and compare the pg_failover_slots numbers with the
> > "one worker" case here). Idea is to check if the patch does introduce
> > some overhead as compare to pg_failover_slots.
> >
>
> Yes, definitely. We will work on that and share the numbers soon.
>

We are working on these tests. Meanwhile attaching the patches which
attempt to implement below functionalities:

1) Remove extra slots on standby if those no longer exist on primary
or are no longer part of synchronize_slot_names.
2) Make synchronize_slot_names user-modifiable. And due to change in
'synchronize_slot_names', if dbids list is reduced, then take care of
removal of extra/old db-ids (if any) from workers db-list.

Thanks Ajin for working on 1. Both the above changes are in
patch-0002. There is a test failure in the recovery module due to
these new changes, I am looking into it and will fix it in the next
version.

Improvements in pipeline:
a) standby slots should not be consumable.
b) optimization of query which standby sends to primary. Currently it
has dbid filter and slot-name filter. Slot-name filter can be
optimized to have only those slots which belong to DBs assigned to the
worker rather than having all 'synchronize_slot_names'.
c) analyze if the naptime of the slot-sync worker can be auto-tuned.
If there is no activity going on (i.e. slots are not advancing on
primary) then increase naptime of slot-sync worker on standby and
decrease it again when activity starts.

thanks
Shveta

Attachment

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

23 August 2023, 05:38:02

On Thu, Aug 17, 2023 at 11:55 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Aug 17, 2023 at 11:44 AM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On 8/14/23 11:52 AM, shveta malik wrote:
> >
> > >
> > > We (myself and Ajin) performed the tests to compute the lag in standby
> > > slots as compared to primary slots with different number of slot-sync
> > > workers configured.
> > >
> >
> > Thanks!
> >
> > > 3 DBs were created, each with 30 tables and each table having one
> > > logical-pub/sub configured. So this made a total of 90 logical
> > > replication slots to be synced. Then the workload was run for aprox 10
> > > mins. During this workload, at regular intervals, primary and standby
> > > slots' lsns were captured (from pg_replication_slots) and compared. At
> > > each capture, the intent was to know how much is each standby's slot
> > > lagging behind corresponding primary's slot by taking the distance
> > > between confirmed_flush_lsn of primary and standby slot. Then we took
> > > the average (integer value) of this distance over the span of 10 min
> > > workload
> >
> > Thanks for the explanations, make sense to me.
> >
> > > and this is what we got:
> > >
> > > With max_slot_sync_workers=1, average-lag =  42290.3563
> > > With max_slot_sync_workers=2, average-lag =  24585.1421
> > > With max_slot_sync_workers=3, average-lag =  14964.9215
> > >
> > > This shows that more workers have better chances to keep logical
> > > replication slots in sync for this case.
> > >
> >
> > Agree.
> >
> > > Another statistics if it interests you is, we ran a frequency test as
> > > well (this by changing code, unit test sort of) to figure out the
> > > 'total number of times synchronization done' with different number of
> > > sync-slots workers configured. Same 3 DBs setup with each DB having 30
> > > logical replication slots. With 'max_slot_sync_workers' set at 1, 2
> > > and 3; total number of times synchronization done was 15874, 20205 and
> > > 23414 respectively. Note: this is not on the same machine where we
> > > captured lsn-gap data, it is on  a little less efficient machine but
> > > gives almost the same picture
> > >
> > > Next we are planning to capture this data for a lesser number of slots
> > > like 10,30,50 etc. It may happen that the benefit of multi-workers
> > > over single workers in such cases could be less, but let's have the
> > > data to verify that.
> > >
> >
> > Thanks a lot for those numbers and for the testing!
> >
> > Do you think it would make sense to also get the number of using
> > the pg_failover_slots module? (and compare the pg_failover_slots numbers with the
> > "one worker" case here). Idea is to check if the patch does introduce
> > some overhead as compare to pg_failover_slots.
> >
>
> Yes, definitely. We will work on that and share the numbers soon.
>

Here are the numbers for pg_failover_extension. Thank You Ajin for
performing all the tests and providing the data offline.

--------------------------------------
pg_failover_slots extension:
------------------------------------
40 slots:
default nap (60 sec):   12742133.96
10ms nap:                   19984.34

90 slots:
default nap (60 sec):  10063342.72
10ms nap:                   34483.82

----------------------------------------------
slot-sync-workers  case (default 10ms nap for each test):
---------------------------------------------
40 slots:
1 worker:                20566.09
3 worker:                7885.80

90 slots:
1 worker: 36706.84
3 worker: 10236.63

Observations:

1) Worker=1 case is slightly behind in our case as compared to
pg_failover_extension (for the same naptime of 10ms). This is due to
the support for multi-worker design where locks and dsm come into
play. I will review this case for optimization.
2) The multi-worker case seems way better in all tests.

Few points we observed while performing the tests on pg_failover_extension:

1) It has a naptime of 60sec which is on the higher side and thus we
see huge lag in slots being synchronized. Please see default-nap
readings above. The default data of extension is not comparable to our
default case. And thus for apple to apple comparisons, we changed
naptime to 10ms for pg_failover_extension.

2) It takes a lot of time while creating-slots. Every slot creation
needs workload to be run on primary i.e. if after say 4th slot
creation, there is no activity going on primary, it waits and does not
proceed to create rest of the slots and thus we had to make sure to
perform some activity on primary in parallel to each slot creation on
standby. This happens because after each slot-creation it checks if
'remote_slot->restart_lsn < MyReplicationSlot->data.restart_lsn' and
if so, it waits for primary to catch-up.  The restart_lsn for newly
created slot is set at XLOG-replay position and when standby is up to
date in terms of data (i.e. all xlog-streams are received and
replayed) and no activity is going on primary,  then the restart-lsn
on standby for a newly created slot at that moment is same as
confirmed-lsn of that slot on primary.  And thus in order to make it
proceed it needs restart-lsn on primary to move forward.
Does it make more sense to have a check which compares confirmed_flush
of primary with restart_lsn of standby i.e. if
'remote_slot->confirmed_flush < MyReplicationSlot->data.restart_lsn'
then only wait for primary to catch-up? This check will mean that we
need to wait only if more operations are performed on primary and
xlogs are received and replayed on standby but still slots on primary
have not been advanced and thus we need to give time to primary to
catch-up.

thanks

Shveta

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

23 August 2023, 10:07:59

On Thu, Aug 17, 2023 at 4:09 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Aug 17, 2023 at 11:55 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Aug 17, 2023 at 11:44 AM Drouvot, Bertrand
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On 8/14/23 11:52 AM, shveta malik wrote:
> > >
> > > >
> > > > We (myself and Ajin) performed the tests to compute the lag in standby
> > > > slots as compared to primary slots with different number of slot-sync
> > > > workers configured.
> > > >
> > >
> > > Thanks!
> > >
> > > > 3 DBs were created, each with 30 tables and each table having one
> > > > logical-pub/sub configured. So this made a total of 90 logical
> > > > replication slots to be synced. Then the workload was run for aprox 10
> > > > mins. During this workload, at regular intervals, primary and standby
> > > > slots' lsns were captured (from pg_replication_slots) and compared. At
> > > > each capture, the intent was to know how much is each standby's slot
> > > > lagging behind corresponding primary's slot by taking the distance
> > > > between confirmed_flush_lsn of primary and standby slot. Then we took
> > > > the average (integer value) of this distance over the span of 10 min
> > > > workload
> > >
> > > Thanks for the explanations, make sense to me.
> > >
> > > > and this is what we got:
> > > >
> > > > With max_slot_sync_workers=1, average-lag =  42290.3563
> > > > With max_slot_sync_workers=2, average-lag =  24585.1421
> > > > With max_slot_sync_workers=3, average-lag =  14964.9215
> > > >
> > > > This shows that more workers have better chances to keep logical
> > > > replication slots in sync for this case.
> > > >
> > >
> > > Agree.
> > >
> > > > Another statistics if it interests you is, we ran a frequency test as
> > > > well (this by changing code, unit test sort of) to figure out the
> > > > 'total number of times synchronization done' with different number of
> > > > sync-slots workers configured. Same 3 DBs setup with each DB having 30
> > > > logical replication slots. With 'max_slot_sync_workers' set at 1, 2
> > > > and 3; total number of times synchronization done was 15874, 20205 and
> > > > 23414 respectively. Note: this is not on the same machine where we
> > > > captured lsn-gap data, it is on  a little less efficient machine but
> > > > gives almost the same picture
> > > >
> > > > Next we are planning to capture this data for a lesser number of slots
> > > > like 10,30,50 etc. It may happen that the benefit of multi-workers
> > > > over single workers in such cases could be less, but let's have the
> > > > data to verify that.
> > > >
> > >
> > > Thanks a lot for those numbers and for the testing!
> > >
> > > Do you think it would make sense to also get the number of using
> > > the pg_failover_slots module? (and compare the pg_failover_slots numbers with the
> > > "one worker" case here). Idea is to check if the patch does introduce
> > > some overhead as compare to pg_failover_slots.
> > >
> >
> > Yes, definitely. We will work on that and share the numbers soon.
> >
>
> We are working on these tests. Meanwhile attaching the patches which
> attempt to implement below functionalities:
>
> 1) Remove extra slots on standby if those no longer exist on primary
> or are no longer part of synchronize_slot_names.
> 2) Make synchronize_slot_names user-modifiable. And due to change in
> 'synchronize_slot_names', if dbids list is reduced, then take care of
> removal of extra/old db-ids (if any) from workers db-list.
>
> Thanks Ajin for working on 1. Both the above changes are in
> patch-0002. There is a test failure in the recovery module due to
> these new changes, I am looking into it and will fix it in the next
> version.
>
> Improvements in pipeline:
> a) standby slots should not be consumable.
> b) optimization of query which standby sends to primary. Currently it
> has dbid filter and slot-name filter. Slot-name filter can be
> optimized to have only those slots which belong to DBs assigned to the
> worker rather than having all 'synchronize_slot_names'.
> c) analyze if the naptime of the slot-sync worker can be auto-tuned.
> If there is no activity going on (i.e. slots are not advancing on
> primary) then increase naptime of slot-sync worker on standby and
> decrease it again when activity starts.
>

Please find the patches attached. 0002 has below changes:

1) The naptime of the worker is now tuned as per the activity on
primary. Each worker starts with a naptime of 10ms and if no activity
is observed on primary for some time, then naptime is increased to
10sec. And if activity is observed again, naptime is reduced back to
10ms. Each worker does it by choosing one slot (first one assigned to
it) for monitoring purposes. If there is no change in lsn of that slot
for say over 5 sync-checks, naptime is increased to 10sec and as soon
as a change is observed, naptime is reduced back to 10ms.

2) The query sent by standby to primary to get slot info is written
better. The query has filters : where DBID in (...) and slot_name in
(..). Earlier the slot_name filter was carrying all the names
mentioned in synchronize_slot_names (if it is not '*'). Now it
mentions only the ones belonging to its own dbids except during the
first run of the query. First run of the query is different since we
are getting this info ('which slot belongs to which db') from standby
only, thus the query will have all slots-names of
'synchronize_slot_names ' until slots are created on standby. This
one-time longer query seems better over pinging primary to get this
info.

Changes to be done/analysed next:
1) find a way to distinguish between user created logical slots and
synced ones. This is needed for below purposes:
a) Avoid dropping user created slots by slot-sync worker.
b) Unlike the user-created slots, synced slots should not be consumable.

2) Handling below corner scenarios:
a) if a worker is exiting due to change in sync_slot_names which made
dbids of that worker no longer valid, then that worker may leave
behind some slots which should otherwise be dropped.
b) if a worker is connected to a dbid and that dbid no longer exists.

3) Analyze if there is any interference with 'minimal logical decoding
on standby' feature.

thanks
Shveta

On Fri, Aug 25, 2023 at 11:09 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Aug 23, 2023 at 4:21 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Aug 23, 2023 at 3:38 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > I have reviewed the v12-0002 patch and I have some comments. I see the
> > latest version posted sometime back and if any of this comment is
> > already fixed in this version then feel free to ignore that.
> >
>
> Thanks for the feedback. Please find my comments on a few. I will work on rest.
>
>
> > 2.
> > ApplyLauncherShmemInit(void)
> >  {
> >   bool found;
> > + bool foundSlotSync;
> >
> > Is there any specific reason to launch the sync worker from the
> > logical launcher instead of making this independent?
> > I mean in the future if we plan to sync physical slots as well then it
> > wouldn't be an expandable design.
>
> When we started working on this, it was reusing logical-apply worker
> infra, so I separated it from logical-apply worker but let it be
> managed by a replication launcher considering that only logical slots
> needed to be synced. I think this needs more thought and I would like
> to know from others as well before concluding anything here.
>
>
> > 5.
> >
> > +static bool
> > +WaitForSlotSyncWorkerAttach(SlotSyncWorker *worker,
> > +    uint16 generation,
> > +    BackgroundWorkerHandle *handle)
> >
> > this function is an exact duplicate of WaitForReplicationWorkerAttach
> > except for the LWlock, why don't we use the same function by passing
> > the LWLock as a parameter
> >
>
> Here workers (first argument) are different. We can always pass
> LWLock, but since workers are different, in order to merge the common
> functionality, we need to have some common worker structure between
> the two workers (apply and sync-slot) and pass that to functions which
> need to be merged (similar to NodeTag used in Insert/CreateStmt etc).
> But changing LogicalRepWorker() would mean changing
> applyworker/table-sync worker/parallel-apply-worker files. Since there
> are only two such functions which you pointed out (attach and
> wait_for_attach), I prefered to keep the functions as is until we
> conclude on where slot-sync worker functionality actually fits in. I
> can revisit these comments then. Or if you see any better way to do
> it, kindly let me know.
>
> > 6.
> > +/*
> > + * Attach Slot-sync worker to worker-slot assigned by launcher.
> > + */
> > +void
> > +slotsync_worker_attach(int slot)
> >
> > this is also very similar to the logicalrep_worker_attach function.
> >
> > Please check other similar functions and reuse them wherever possible
> >
> > Also, why this function is not registering the cleanup function on shmmem_exit?
> >
>
> It is doing it in ReplSlotSyncMain() since we have dsm-seg there.
> Please see this:
>
>         /* Primary initialization is complete. Now, attach to our slot. */
>         slotsync_worker_attach(worker_slot);
>         before_shmem_exit(slotsync_worker_detach, PointerGetDatum(seg));
>

PFA new patch-set which attempts to fix these:

a) Synced slots on standby are not consumable i.e.
pg_logical_slot_get/peek_changes will give error on these while will
work on user-created slots.
b) User created slots on standby will not be dropped by slot-sync
workers anymore. Earlier slot-sync worker was dropping all the slots
which were not part of synchronize_slot_names.
c) Now DSA is being used for dbids to facilitate memory extension if
required without needing to restart the worker. Earlier dsm was used
alone which needed restart of the worker in case the memory allocated
needs to be extended.

Changes are in patch 0002.

Next in pipeline:
1. Handling of corner scenarios which I explained in:
https://www.postgresql.org/message-id/CAJpy0uC%2B2agRtF3H%3Dn-hW5JkoPfaZkjPXJr%3D%3Dy3_PRE04dQvhw%40mail.gmail.com
2. Revisiting comments (older ones in this thread and latest given) for patch 1.

thanks
Shveta

On Wed, Aug 30, 2023 at 9:29 AM shveta malik <shveta.malik@gmail.com> wrote:
>
>
> PFA new patch-set which attempts to fix these:
>

PFA v15 which implements below changes:

1) It parses synchronize_slot_names and standby_slot_names and caches
the list to avoid repeated parsing. This parsing is done at Walsender
startup on primary and slot-sync worker startup on standby and then
during each SIGHUP.

2) Handles slots invaliation:
2.1) If the slot is invalidated on primary, it is now invalidated on
standby as well. Standby gets invalidation info from primary using a
new system function 'pg_get_invalidation_cause(slotname)'.
2.2) if the slot is invalidated on standby alone, it is dropped and
recreated as per synchronize_slot_names in next sync-cycle.

3) The test file 051_slot_sync.pl is removed from patch2 for the
time-being. It was testing whether the logical slot on standby is
conflicted or not once slot on primary is removed by 'Drop
Subscription' and WALs needed by logical slot on standby are flushed
on primary (with hot_standby_feedback=off). But as per current
implementation, we drop the slot on standby as soon as subscription is
dropped on primary. So the testcase no longer solves the purpose for
which it was added. Correct set of test cases will be added going
forward.

4) Address most of the comments by Peter.

Change 1 is in patch01 along with patch02, rest are in patch02 alone.

Thank You Ajin for assisting on the above changes.

Next in the pipeline:
1) Currently it allows specifying logical slots in standby_slot_names.
This should be prohibited.
2) We need to ensure that WAL is replayed on standby before moving the
slot's position to the target location received from the primary.
3) Rest of the comments upthread.

thanks
Shveta

On Thu, Sep 7, 2023 at 8:29 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Fri, Aug 25, 2023 at 2:15 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > Wait a minute ...
> >
> > From bac0fbef8b203c530e5117b0b7cfee13cfab78b9 Mon Sep 17 00:00:00 2001
> > From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
> > Date: Sat, 22 Jul 2023 10:17:48 +0000
> > Subject: [PATCH v13 1/2] Allow logical walsenders to wait for physical
> >  standbys
> >
> > @@ -2498,6 +2500,13 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> >                 }
> >                 else
> >                 {
> > +                       /*
> > +                        * Before we send out the last set of changes to logical decoding
> > +                        * output plugin, wait for specified streaming replication standby
> > +                        * servers (if any) to confirm receipt of WAL upto commit_lsn.
> > +                        */
> > +                       WaitForStandbyLSN(commit_lsn);
> >
> > OK, so we call this new function frequently enough -- once per
> > transaction, if I read this correctly?  So ...
> >
> > +void
> > +WaitForStandbyLSN(XLogRecPtr wait_for_lsn)
> > +{
> >  ...
> >
> > +       /* "*" means all logical walsenders should wait for physical standbys. */
> > +       if (strcmp(synchronize_slot_names, "*") != 0)
> > +       {
> > +               bool    shouldwait = false;
> > +
> > +               rawname = pstrdup(synchronize_slot_names);
> > +               SplitIdentifierString(rawname, ',', &elemlist);
> > +
> > +               foreach (l, elemlist)
> > +               {
> > +                       char *name = lfirst(l);
> > +                       if (strcmp(name, NameStr(MyReplicationSlot->data.name)) == 0)
> > +                       {
> > +                               shouldwait = true;
> > +                               break;
> > +                       }
> > +               }
> > +
> > +               pfree(rawname);
> > +               rawname = NULL;
> > +               list_free(elemlist);
> > +               elemlist = NIL;
> > +
> > +               if (!shouldwait)
> > +                       return;
> > +       }
> > +
> > +       rawname = pstrdup(standby_slot_names);
> > +       SplitIdentifierString(rawname, ',', &elemlist);
> >
> > ... do we really want to be doing the GUC string parsing every time
> > through it?  This sounds like it could be a bottleneck, or at least slow
> > things down.  Maybe we should think about caching this somehow.
> >
>
> Yes, these parsed lists are now cached. Please see v15
> (https://www.postgresql.org/message-id/CAJpy0uAuzbzvcjpnzFTiWuDBctnH-SDZC6AZabPX65x9GWBrjQ%40mail.gmail.com)
>
> thanks
> Shveta

Patches (v15) were no longer applying to HEAD, rebased those and
addressed below along-with:

1) Fixed an issue in slots-invalidation code-path on standby. Thanks
Ajin for testing the patch and finding the issue.
2) Ensure that WAL is replayed on standby before moving the slot's
position to the target location received from the primary.
3) Some code restructuring in slotsync.c

thanks
Shveta

Attachment

RE: Synchronizing slots from primary to standby

From

"Hayato Kuroda (Fujitsu)"

Date:

08 September 2023, 11:10:21

Dear Shveta,

I resumed to check the thread. Here are my high-level comments.
Sorry if you have been already discussed.

01. General

I think the documentation can be added, not only GUCs. How about adding examples
for combinations of physical and logical replications?  You can say that both of
physical primary can be publisher and slots on primary/standby are synchronized.

02. General

standby_slot_names ensures that physical standby is always ahead subscriber, but I
think it may be not sufficient. There is a possibility that primary server does
not have any physical slots. In this case the physical standby may be behind the
subscriber and the system may be confused when the failover is occured. Can't
we specify the name of standby via application_name or something?


03. General

In this architecture, the syncslot worker is launched per db and they
independently connects to primary, right? I'm not sure it is efficient, but I
come up with another architecture - only a worker (syncslot receiver)connects
to the primary and other workers (syncslot worker) receives infos from it and
updates. This can reduce the number of connections so that it may slightly
improve the latency of network. How do you think?

04. General

test file recovery/t/051_slot_sync.pl is missing.

04. ReplSlotSyncMain

Does the worker have to connect to the specific database?


```
    /* Connect to our database. */
    BackgroundWorkerInitializeConnectionByOid(MySlotSyncWorker->dbid,
                                              MySlotSyncWorker->userid,
                                              0);
```

05. SlotSyncInitSlotNamesLst()

"Lst" should be "List".

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

11 September 2023, 02:56:21

On Fri, Sep 8, 2023 at 1:54 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Sep 7, 2023 at 8:29 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Fri, Aug 25, 2023 at 2:15 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > >
> > > Wait a minute ...
> > >
> > > From bac0fbef8b203c530e5117b0b7cfee13cfab78b9 Mon Sep 17 00:00:00 2001
> > > From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
> > > Date: Sat, 22 Jul 2023 10:17:48 +0000
> > > Subject: [PATCH v13 1/2] Allow logical walsenders to wait for physical
> > >  standbys
> > >
> > > @@ -2498,6 +2500,13 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > >                 }
> > >                 else
> > >                 {
> > > +                       /*
> > > +                        * Before we send out the last set of changes to logical decoding
> > > +                        * output plugin, wait for specified streaming replication standby
> > > +                        * servers (if any) to confirm receipt of WAL upto commit_lsn.
> > > +                        */
> > > +                       WaitForStandbyLSN(commit_lsn);
> > >
> > > OK, so we call this new function frequently enough -- once per
> > > transaction, if I read this correctly?  So ...
> > >
> > > +void
> > > +WaitForStandbyLSN(XLogRecPtr wait_for_lsn)
> > > +{
> > >  ...
> > >
> > > +       /* "*" means all logical walsenders should wait for physical standbys. */
> > > +       if (strcmp(synchronize_slot_names, "*") != 0)
> > > +       {
> > > +               bool    shouldwait = false;
> > > +
> > > +               rawname = pstrdup(synchronize_slot_names);
> > > +               SplitIdentifierString(rawname, ',', &elemlist);
> > > +
> > > +               foreach (l, elemlist)
> > > +               {
> > > +                       char *name = lfirst(l);
> > > +                       if (strcmp(name, NameStr(MyReplicationSlot->data.name)) == 0)
> > > +                       {
> > > +                               shouldwait = true;
> > > +                               break;
> > > +                       }
> > > +               }
> > > +
> > > +               pfree(rawname);
> > > +               rawname = NULL;
> > > +               list_free(elemlist);
> > > +               elemlist = NIL;
> > > +
> > > +               if (!shouldwait)
> > > +                       return;
> > > +       }
> > > +
> > > +       rawname = pstrdup(standby_slot_names);
> > > +       SplitIdentifierString(rawname, ',', &elemlist);
> > >
> > > ... do we really want to be doing the GUC string parsing every time
> > > through it?  This sounds like it could be a bottleneck, or at least slow
> > > things down.  Maybe we should think about caching this somehow.
> > >
> >
> > Yes, these parsed lists are now cached. Please see v15
> > (https://www.postgresql.org/message-id/CAJpy0uAuzbzvcjpnzFTiWuDBctnH-SDZC6AZabPX65x9GWBrjQ%40mail.gmail.com)
> >
> > thanks
> > Shveta
>
> Patches (v15) were no longer applying to HEAD, rebased those and
> addressed below along-with:
>
> 1) Fixed an issue in slots-invalidation code-path on standby. Thanks
> Ajin for testing the patch and finding the issue.
> 2) Ensure that WAL is replayed on standby before moving the slot's
> position to the target location received from the primary.
> 3) Some code restructuring in slotsync.c
>
> thanks
> Shveta

There were cfbot failures on v16 patches:
--presence of 051_slot_sync.pl in meson.build even though the file is removed.
--usage of uint in launcher.c

Fixed above and attached v16_2_0001/0002 patches again.

thanks
Shveta

On Mon, Sep 11, 2023 at 9:49 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Fri, Sep 8, 2023 at 4:40 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Dear Shveta,
> >
> > I resumed to check the thread. Here are my high-level comments.
> > Sorry if you have been already discussed.
>
> Thanks Kuroda-san for the feedback.
> >
> > 01. General
> >
> > I think the documentation can be added, not only GUCs. How about adding examples
> > for combinations of physical and logical replications?  You can say that both of
> > physical primary can be publisher and slots on primary/standby are synchronized.
> >
>
> I did not fully understand this. Can you please state a clear example.
> We are only synchronizing logical replication slots in this draft and
> that too on physical standby from primary. So the last statement is
> not completely true.
>
> > 02. General
> >
> > standby_slot_names ensures that physical standby is always ahead subscriber, but I
> > think it may be not sufficient. There is a possibility that primary server does
> > not have any physical slots.So it expects a slot to be present.
> > In this case the physical standby may be behind the
> > subscriber and the system may be confused when the failover is occured.
>
> Currently there is a check in slot-sync worker which mandates that
> there is a physical slot present between primary and standby for this
> feature to proceed.So that confusion state will not arise.
> + /* WalRcvData is not set or primary_slot_name is not set yet */
> + if (!WalRcv || WalRcv->slotname[0] == '\0')
> + return naptime;
>
> >Can't we specify the name of standby via application_name or something?
>
> So do you mean that in absence of a physical slot (if we plan to
> support that), we let primary know about standby(slots-synchronization
> client) through application_name? I am not sure about this. Will think
> more on this. I would like to know others' opinion on this as well.
>
> >
> > 03. General
> >
> > In this architecture, the syncslot worker is launched per db and they
> > independently connects to primary, right?
>
> Not completely true. Each slotsync worker is responsible for managing
> N dbs. Here 'N' =  'Number of distinct dbs for slots in
> synchronize_slot_names'/ 'number of max_slotsync_workers configured'
> for cases where dbcount exceeds workers configured.
> And if dbcount < max_slotsync_workers, then we launch only that many
> workers equal to dbcount and each worker manages a single db. Each
> worker independently connects to primary. Currently it makes a
> connection multiple times, I am optimizing it to make connection only
> once and then after each SIGHUP assuming 'primary_conninfo' may
> change. This change will be in the next version.
>
>
> >I'm not sure it is efficient, but I
> > come up with another architecture - only a worker (syncslot receiver)connects
> > to the primary and other workers (syncslot worker) receives infos from it and
> > updates. This can reduce the number of connections so that it may slightly
> > improve the latency of network. How do you think?
> >
>
> I feel it may help in reducing network latency, but not sure if it
> could be more efficient in keeping the lsns in sync. I feel it may
> introduce lag due to the fact that only one worker is getting all the
> info from primary and the actual synchronizing workers are waiting on
> that worker. This lag may be more when the number of slots are huge.
> We have run some performance tests on the design implemented
> currently, please have a look at emails around [1] and [2].
>
> > 04. General
> >
> > test file recovery/t/051_slot_sync.pl is missing.
> >
>
> yes, it was removed. Please see point3 at [3]
>
>
> > 04. ReplSlotSyncMain
> >
> > Does the worker have to connect to the specific database?
> >
> >
> > ```
> >         /* Connect to our database. */
> >         BackgroundWorkerInitializeConnectionByOid(MySlotSyncWorker->dbid,
> >                                                                                           MySlotSyncWorker->userid,
> >                                                                                           0);
> > ```
>
> Since we are using libpq public interface 'walrcv_exec=libpqrcv_exec'
> to connect to primary, this needs database connection. It errors out
> in the absence of 'MyDatabaseId'. Do you think db-connection can have
> some downsides?
>
> >
> > 05. SlotSyncInitSlotNamesLst()
> >
> > "Lst" should be "List".
> >
>
> Okay, I will change this in the next version.
>
> ==========
>
> [1]: https://www.postgresql.org/message-id/CAJpy0uD2F43avuXy_yQv7Wa3kpUwioY_Xn955xdmd6vX0ME6%3Dg%40mail.gmail.com
> [2]: https://www.postgresql.org/message-id/CAJpy0uD%3DDevMxTwFVsk_%3DxHqYNH8heptwgW6AimQ9fbRmx4ioQ%40mail.gmail.com
> [3]: https://www.postgresql.org/message-id/CAJpy0uAuzbzvcjpnzFTiWuDBctnH-SDZC6AZabPX65x9GWBrjQ%40mail.gmail.com
>
> thanks
> Shveta


PFA  v17. It has below changes:

1) There was a common portion between SlotSync worker and LogicalRep
worker structures. The common portion is now moved to WorkerHeader.
The common functions are merged.
2) Connection to primary is made once in the beginning in both
slotSync worker as well as launcher. Earlier it was before each sync
cycle.
3) SpinLock Removed. Earlier LWlock was used for shared-memory access
by workers and then there was extra Spinlock for dbids access in DSM,
which is removed now. LWLock alone seems enough to maintain the
consistency.
4) In 'alter system standby_slot_names', we can not give non-existing
slot-names or logical slots now. Earlier it was accepting everything.
This specific change is in patch1, rest in patch2.

Thanks Ajin for working on 1.

Next, I plan to review patch01 and the existing feedback about it.
Until now focus was patch02.

thanks
Shveta

On Wed, Sep 13, 2023 at 5:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Sep 13, 2023 at 4:54 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > PFA  v17. It has below changes:
> >
>
> @@ -2498,6 +2500,13 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
> ReorderBufferTXN *txn,
>   }
>   else
>   {
> + /*
> + * Before we send out the last set of changes to logical decoding
> + * output plugin, wait for specified streaming replication standby
> + * servers (if any) to confirm receipt of WAL upto commit_lsn.
> + */
> + WaitForStandbyLSN(commit_lsn);
>
> It seems the first patch has a wait logic for every commit. I think it
> is better to integrate this wait with WalSndWaitForWal() as suggested
> by Andres in his email[1].
>
> [1] - https://www.postgresql.org/message-id/20220207204557.74mgbhowydjco4mh%40alap3.anarazel.de
>
> --

Sure Amit. PFA  v18. It addresses below:

1) patch001: wait for physical-standby confirmation logic is now
integrated with WalSndWaitForWal(). Now walsender waits for physical
standby's confirmation to take changes upto RecentFlushPtr in
WalSndWaitForWal(). This allows walsender to send the changes to
logical subscribers one by one which are already covered in
RecentFlushPtr without needing to wait on every commit for physical
standby confirmation.

2) if synchronize_slot_names set on physical standby has physical slot
name, primary's walsender on receiving that will error out. This is
currently done in ListSlotDatabaseOIDs(), but it needs to be moved to
logic where standby will send synchronize_slot_names to be set on
primary and primary will validate that first. GUC
synchronize_slot_names will be removed from primary. This arrangement
to be done in next version.

3) Peter's comment dated Sep15.

thanks
Shveta

PFA v19 patches which as below changes:

1) Now for slot synchronization to work, user must specify dbname in
primary_conninfo on physical standbys. This dbname is used by
slot-sync worker
a) for its own connection to db  (this db connection is needed by
libpqwalreceiver APIs)
b) to connect to primary in order to get slot-info.
In absence of this dbname in primary_conninfo, slot-sync worker will error out.

2) slotsync_worker_stop() is now merged to
logicalrep_worker_stop_internal(). Some other changes are also made as
per Peter's suggestion.

3) There was a bug in patch001 where in wrong lsn position was passed
to WaitForStandbyConfirmation (record-loc instead of RecentFlusPtr)
leading to logical subscriber getting ahead of physical-standbys in
some cases. It is fixed now. This will most probably fix cfbot
failure.

First 2 changes are in patch0002 and third one in patch001.

Than You Ajin for working on 1 and 2.

thanks
Shveta

On Mon, Sep 25, 2023 at 12:14 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> FYI -- v19 failed to apply cleanly with the latest HEAD.
>
> [postgres@CentOS7-x64 oss_postgres_misc]$ git apply
> ../patches_misc/v19-0001-Allow-logical-walsenders-to-wait-for-physical-st.patch
> error: patch failed: src/test/recovery/meson.build:44
> error: src/test/recovery/meson.build: patch does not apply
>
> ------
Rebased the patch, updating new one, calling it v19_2

regards,
Ajin Cherian

Dear Ajin, Shveta,

Thank you for rebasing the patch set! Here are new comments for v19_2-0001.

01. WalSndWaitForStandbyNeeded()

```
    if (SlotIsPhysical(MyReplicationSlot))
        return false;
```

Is there a possibility that physical walsenders call this function? 
IIUC following is a stacktrace for the function, so the only logical walsenders use it.
If so, it should be Assert() instead of an if statement.

logical_read_xlog_page()
WalSndWaitForWal()
WalSndWaitForStandbyNeeded()

02. WalSndWaitForStandbyNeeded()

Can we set shouldwait in SlotSyncInitConfig()? synchronize_slot_names_list is
searched whenever the function is called, but it is not changed automatically.
If the slotname is compared with the list in the SlotSyncInitConfig(), the
liner search can be reduced.

03. WalSndWaitForStandbyConfirmation()

We should add ProcessRepliesIfAny() during the loop, otherwise the walsender
overlooks the death of an apply worker.

04. WalSndWaitForStandbyConfirmation()

Not sure, but do we have to return early if walsenders got PROCSIG_WALSND_INIT_STOPPING
signal? I thought that if physical walsenders get stuck, logical walsenders wait
forever. At that time we cannot stop the primary server even if "pg_ctl stop"
is executed.

05. SlotSyncInitConfig()

Why don't we free the memory for rawname, old standby_slot_names_list, and synchronize_slot_names_list?
They seem to be overwritten.

06. SlotSyncInitConfig()

Both physical and logical walsenders call the func, but physical one do not use
lists, right? If so, can we add a quick exit for physical walsenders?
Or, we should carefully remove where physical calls it.

07. StartReplication()

I think we do not have to call SlotSyncInitConfig().
Alternative approach is written in above.

08. the other

Also, I found the unexpected behavior after both 0001 and 0002 were applied.
Was it normal or not? 

1. constructed below setup
(ensured that logical slot existed on secondary)
2. stopped the primary
3. promoted the secondary server
4. disabled a subscription once
5. changed the connection string for subscriber
6. Inserted data to new primary
7. enabled the subscription again
8. got an ERROR: replication slot "sub" does not exist

I expected that the logical replication would be restarted, but it could not.
Was it real issue or my fault? The error would appear in secondary.log.

```
Setup:
primary--->secondary
   |
   |
subscriber
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment

test_0925.sh

Re: Synchronizing slots from primary to standby

From

"Drouvot, Bertrand"

Date:

25 September 2023, 14:37:34

Hi,

On 9/25/23 10:44 AM, Drouvot, Bertrand wrote:
> Hi,
> 
> On 9/23/23 3:38 AM, Amit Kapila wrote:
>> On Fri, Sep 22, 2023 at 6:01 PM Drouvot, Bertrand
>> <bertranddrouvot.pg@gmail.com> wrote:

>> There is a difference here that we also need to prevent removal of
>> rows required by sync_slots. That could be achieved by physical slot
>> (and hot_standby_feedback). So, having a requirement to have physical
>> slot doesn't sound too unreasonable to me. Otherwise, we need to
>> invent some new mechanism of having some sort of placeholder slot to
>> avoid removal of required rows. 
> 
> Thinking about it, I wonder if removal of required rows is even possible
> given that:
> 
> - we don't allow to logical decode from a sync slot
> - sync slot catalog_xmin <= its primary counter part catalog_xmin
> - its primary counter part prevents rows removal thanks to its own catalog_xmin
> - a sync slot is removed as soon as its primary counter part is removed
> 
> In that case I'm not sure how rows removal on the primary could lead to remove rows
> required by a sync slot. Am I missing something? Do you have a scenario in mind?

Please forget the above questions, it's in fact pretty easy to remove rows on the primary that
would be needed by a sync slot.

I do agree that having a requirement to have physical slot does not sound unreasonable then.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Peter Smith

Date:

27 September 2023, 04:36:43

Here are some more review comments for the patch v19-0002.

This is a WIP.... these review comments are all for the file slotsync.c

======
src/backend/replication/logical/slotsync.c

1. wait_for_primary_slot_catchup

+ WalRcvExecResult *res;
+ TupleTableSlot *slot;
+ Oid slotRow[1] = {LSNOID};
+ StringInfoData cmd;
+ bool isnull;
+ XLogRecPtr restart_lsn;
+
+ for (;;)
+ {
+ int rc;

I could not recognize a reason why 'rc' is declared within the loop,
but none of the other local variables are. Personally, I'd declare all
variables at the deepest scope (e.g. inside the for loop).

~~~

2. get_local_synced_slot_names

+/*
+ * Get list of local logical slot names which are synchronized from
+ * primary and belongs to one of the DBs passed in.
+ */
+static List *
+get_local_synced_slot_names(Oid *dbids)
+{

IIUC, this function gets called only from the drop_obsolete_slots()
function. But I thought this list of local slot names (i.e. for the
dbids that this worker is handling) would be something that perhaps
could the initialized one time for the worker, instead of it being
re-calculated every single time the slots processing/dropping happens.
Isn't the current code expending too much effort recalculating over
and over but giving back the same list every time?

~~~

3. get_local_synced_slot_names

+ for (int i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
+
+ /* Check if it is logical synchronized slot */
+ if (s->in_use && SlotIsLogical(s) && s->data.synced)
+ {
+ for (int j = 0; j < MySlotSyncWorker->dbcount; j++)
+ {

Loop variables are not declared in the common PG code way.

~~~

4. slot_exists_locally

+static bool
+slot_exists_locally(List *remote_slots, ReplicationSlot *local_slot,
+ bool *locally_invalidated)
+{
+ ListCell   *cell;
+
+ foreach(cell, remote_slots)
+ {
+ RemoteSlot *remote_slot = (RemoteSlot *) lfirst(cell);
+
+ if (strcmp(remote_slot->name, NameStr(local_slot->data.name)) == 0)
+ {
+ /*
+ * if remote slot is marked as non-conflicting (i.e. not
+ * invalidated) but local slot is marked as invalidated, then set
+ * the bool.
+ */
+ if (!remote_slot->conflicting &&
+ SlotIsLogical(local_slot) &&
+ local_slot->data.invalidated != RS_INVAL_NONE)
+ *locally_invalidated = true;
+
+ return true;
+ }
+ }
+
+ return false;
+}

Why is there a SlotIsLogical(local_slot) check buried in this
function? How is slot_exists_locally() getting called with a
non-logical local_slot? Shouldn't that have been screened out long
before here?

~~~

5. use_slot_in_query

+static bool
+use_slot_in_query(char *slot_name, Oid *dbids)

There are multiple non-standard for-loop variable declarations in this function.

~~~

6. compute_naptime

+ * The first slot managed by each worker is chosen for monitoring purpose.
+ * If the lsn of that slot changes during each sync-check time, then the
+ * nap time is kept at regular value of WORKER_DEFAULT_NAPTIME_MS.
+ * When no lsn change is observed for WORKER_INACTIVITY_THRESHOLD_MS
+ * time, then the nap time is increased to WORKER_INACTIVITY_NAPTIME_MS.
+ * This nap time is brought back to WORKER_DEFAULT_NAPTIME_MS as soon as
+ * lsn change is observed.

6a.
/regular value/the regular value/

/for WORKER_INACTIVITY_THRESHOLD_MS time/within the threshold period
(WORKER_INACTIVITY_THRESHOLD_MS)/

~

6b.
/as soon as lsn change is observed./as soon as another lsn change is observed./

~~~

7.
+ * The caller is supposed to ignore return-value of 0. The 0 value is returned
+ * for the slots other that slot being monitored.
+ */
+static long
+compute_naptime(RemoteSlot *remote_slot)

This rule about the returning 0 seemed hacky to me. IMO this would be
a better API to pass long *naptime (which this function either updates
or doesn't update, depending on this being the "monitored" slot.
Knowing the current naptime is also useful to improve the function
logic (see the next review comment below).

Also, since this function is really only toggling naptime between 2
values, it would be helpful to assert that

Assert(*naptime == WORKER_DEFAULT_NAPTIME_MS || *naptime ==
WORKER_INACTIVITY_NAPTIME_MS);

~~~

8.
+ if (NameStr(MySlotSyncWorker->monitoring_info.slot_name)[0] == '\0')
+ {
+ /*
+ * First time, just update the name and lsn and return regular
+ * nap time. Start comparison from next time onward.
+ */
+ strcpy(NameStr(MySlotSyncWorker->monitoring_info.slot_name),
+    remote_slot->name);

I wasn't sure why it was necessary to identify the "monitoring" slot
by name. Why doesn't the compute_naptime just get called only for the
1st slot found in the tuple loop instead of all the strcmp business
trying to match monitor names?

And, if the monitored slot gets "dropped", then so what; next time
another slot will be the first tuple so will automatically take its
place, right?

~~~

9.
+ /*
+ * If new received lsn (remote one) is different from what we have in
+ * our local slot, then update last_update_time.
+ */
+ if (MySlotSyncWorker->monitoring_info.confirmed_lsn !=
+ remote_slot->confirmed_lsn)
+ MySlotSyncWorker->monitoring_info.last_update_time = now;
+
+ MySlotSyncWorker->monitoring_info.confirmed_lsn =
+ remote_slot->confirmed_lsn;

Doesn't it make more sense to also put that 'confirmed_lsn' assignment
under the same condition? e.g. No need to overwrite the same value
again.

~~~

10.
+ /* If the inactivity time reaches the threshold, increase nap time */
+ if (TimestampDifferenceExceeds(MySlotSyncWorker->monitoring_info.last_update_time,
+    now, WORKER_INACTIVITY_THRESHOLD_MS))
+ return WORKER_INACTIVITY_NAPTIME_MS;
+ else
+ return WORKER_DEFAULT_NAPTIME_MS;
+ }

Somehow this feels overcomplicated to me.

In reality, the naptime is only toggling between 2 values (DEFAULT and
INACTIVITY) so we should never need to be testing
TimestampDifferenceExceeds again and again on subsequent calls (there
might be 1000s of them)

Once naptime is WORKER_INACTIVITY_NAPTIME_MS we know to reset it back
to WORKER_DEFAULT_NAPTIME_MS only if
(MySlotSyncWorker->monitoring_info.confirmed_lsn !=
remote_slot->confirmed_lsn) is detected.

Basically, I think the algorithm should be like the code below:

TimestampTz now = GetCurrentTimestamp();

if (MySlotSyncWorker->monitoring_info.confirmed_lsn !=
remote_slot->confirmed_lsn)
{
  MySlotSyncWorker->monitoring_info.last_update_time = now;
  MySlotSyncWorker->monitoring_info.confirmed_lsn = remote_slot->confirmed_lsn;

  /* Something changed; reset naptime to default. */
  *naptime = WORKER_DEFAULT_NAPTIME_MS;
}
else
{
  if (*naptime == WORKER_DEFAULT_NAPTIME_MS)
  {
    /* If the inactivity time reaches the threshold, increase nap time. */
    if (TimestampDifferenceExceeds(MySlotSyncWorker->monitoring_info.last_update_time,
now, WORKER_INACTIVITY_THRESHOLD_MS))
      *naptime = WORKER_INACTIVITY_NAPTIME_MS;
  }
}

~~~

11. get_remote_invalidation_cause

+/*
+ * Get Remote Slot's invalidation cause.
+ *
+ * This gets invalidation cause of remote slot.
+ */
+static ReplicationSlotInvalidationCause
+get_remote_invalidation_cause(WalReceiverConn *wrconn, char *slot_name)
+{

Isn't that function comment just repeating itself?

~~~

12.
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd,
+ "select pg_get_slot_invalidation_cause(%s)",
+ quote_literal_cstr(slot_name));

Use uppercase "SELECT" for consistency with other SQL.

~~~

13.
+ /* Make things live outside TX context */
+ MemoryContextSwitchTo(oldctx);
+
+ initStringInfo(&cmd);
+ appendStringInfo(&cmd,
+ "select pg_get_slot_invalidation_cause(%s)",
+ quote_literal_cstr(slot_name));
+ res = walrcv_exec(wrconn, cmd.data, 1, slotRow);
+ pfree(cmd.data);
+
+ CommitTransactionCommand();
+
+ /* Switch to oldctx we saved */
+ MemoryContextSwitchTo(oldctx);

There are 2x MemoryContextSwitchTo(oldctx) here. Is that deliberate?

~~~

14.
+ if (res->status != WALRCV_OK_TUPLES)
+ ereport(ERROR,
+ (errmsg("could not fetch invalidation cuase for slot \"%s\" from"
+ " primary: %s", slot_name, res->err)));

typo /cuase/cause/

~~~

15.
+ slot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
+ if (!tuplestore_gettupleslot(res->tuplestore, true, false, slot))
+ ereport(ERROR,
+ (errmsg("slot \"%s\" disapeared from the primary",
+ slot_name)));

typo /disapeared/disappeared/

~~~


16. drop_obsolete_slots

+/*
+ * Drop obsolete slots
+ *
+ * Drop the slots which no longer need to be synced i.e. these either
+ * do not exist on primary or are no longer part of synchronize_slot_names.
+ *
+ * Also drop the slots which are valid on primary and got invalidated
+ * on standby due to conflict (say required rows removed on primary).
+ * The assumption is, these will get recreated in next sync-cycle and
+ * it is okay to drop and recreate such slots as long as these are not
+ * consumable on standby (which is the case currently).
+ */

/which no/that no/

/which are/that are/

/these will/that these will/

/and got invalidated/that got invalidated/

~~~

17.
+ /* If this slot is being monitored, clean-up the monitoring info */
+ if (strcmp(NameStr(local_slot->data.name),
+    NameStr(MySlotSyncWorker->monitoring_info.slot_name)) == 0)
+ {
+ MemSet(NameStr(MySlotSyncWorker->monitoring_info.slot_name), 0, NAMEDATALEN);
+ MySlotSyncWorker->monitoring_info.confirmed_lsn = 0;
+ MySlotSyncWorker->monitoring_info.last_update_time = 0;
+ }

Maybe it is better to assign InvalidXLogRecPtr instead of 0 to the cleared lsn.

~

Alternatively, consider just zapping the entire monitoring_info
structure in one go:
MemSet(&MySlotSyncWorker->monitoring_info, 0,
sizeof(MySlotSyncWorker->monitoring_info));

~~~

18. construct_slot_query (calling use_slot_in_query)

This separation of functions (use_slot_in_query /
construct_slot_query) seems awkward to me. The use_slot_in_query()
function is only called by construct_slot_query(). I felt it might be
simpler to keep all the logical with the construct_slot_query().

Furthermore, it seemed strange to iterate all the DBs (to populate the
"WHERE database IN" clause) and then iterate all the DBs multiple
times again in use_slot_in_query (looking for slots to populate the
"AND slot_name IN (" clause).

Maybe I misunderstand the reason for this structuring, but IMO it
would be simpler code to keep all the logic in construct_slot_query()
like:

a. Initialize with empty dblist, empty slotlist.
b. Iterate all dbids
- constructing the dblist as you go
- constructing the slot list as you go (if synchronize_slot_names is
not "" or "*")
c. Finally, build the query: basic + dblist-clause + optional slotlist-clause

~~~

19. construct_slot_query

Why does this function return a boolean? I only see it returns true,
but never false.

~~~

20.
+ {
+ ListCell   *lc;
+ bool first_slot = true;
+
+
+ foreach(lc, sync_slot_names_list)

Unnecessary blank line.

~~~

21. synchronize_one_slot

+/*
+ * Synchronize single slot to given position.
+ *
+ * This creates new slot if there is no existing one and updates the
+ * metadata of existing slots as per the data received from the primary.
+ */
+static void
+synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot)

/creates new slot/creates a new slot/

/metadata of existing slots/metadata of the slot/

~~~

22

+ /* Search for the named slot and mark it active if we find it. */
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+ for (int i = 0; i < max_replication_slots; i++)
+ {
+ ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
+
+ if (!s->in_use)
+ continue;
+
+ if (strcmp(NameStr(s->data.name), remote_slot->name) == 0)
+ {
+ found = true;
+ break;
+ }
+ }
+ LWLockRelease(ReplicationSlotControlLock);
22a.
"and mark it active if we find it." -- What code here is marking
anything active?

~

22b.
Uncommon style of loop variable declaration

~

22c.
IMO it is over-complicated code; e.g. same loop can be written like this:

SUGGESTION
for (i = 0; i < max_replication_slots && !found; i++)
{
  ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];

  if (s->in_use)
    found = (strcmp(NameStr(s->data.name), remote_slot->name) == 0);
}

~~~

23. synchronize_slots

+ /* Construct query to get slots info from the primary */
+ initStringInfo(&s);
+ if (!construct_slot_query(&s, dbids))
+ {
+ pfree(s.data);
+ CommitTransactionCommand();
+ LWLockRelease(SlotSyncWorkerLock);
+ return naptime;
+ }

As noted elsewhere, it seems construct_slot_query() will never return
false and so this block of code is unreachable.

~~~

24.
+ /* Create list of remote slot names to be used by drop_obsolete_slots */
+ remote_slot_list = lappend(remote_slot_list, remote_slot);

This is a list of slots, not just slot names.

~~~

25.
+ /*
+ * Update nap time in case of non-zero value returned. The zero value
+ * is returned if remote_slot is not the one being monitored.
+ */
+ value = compute_naptime(remote_slot);
+ if (value)
+ naptime = value;

If the compute_naptime API is changed as suggested in a prior review
comment then this can be simplified to something like:

SUGGESTION:
/* Update nap time as required depending on slot activity. */
compute_naptime(remote_slot, &naptime);

~~~

26.
+ /*
+ * Drop local slots which no longer need to be synced i.e. these either do
+ * not exist on primary or are no longer part of synchronize_slot_names.
+ */
+ drop_obsolete_slots(dbids, remote_slot_list);

/which no longer/that no longer/

I thought it might be better to omit the "i.e." part. Just leave it to
the function-header of drop_obsolete_slots for a detailed explanation
about *which* slots are candidates for dropping.

~

27.
+ /* We are done, free remot_slot_list elements */
+ foreach(cell, remote_slot_list)
+ {
+ RemoteSlot *remote_slot = (RemoteSlot *) lfirst(cell);
+
+ pfree(remote_slot);
+ }

27a.
/remot_slot_list/remote_slot_list/

~

27b.
Isn't this just the same as the one-liner:

list_free_deep(remote_slot_list);

~~~

28.
+/*
+ * Initialize the list from raw synchronize_slot_names and cache it, in order
+ * to avoid parsing it repeatedly. Done at slot-sync worker startup and after
+ * each SIGHUP.
+ */
+static void
+SlotSyncInitSlotNamesList()
+{
+ char    *rawname;
+
+ if (strcmp(synchronize_slot_names, "") != 0 &&
+ strcmp(synchronize_slot_names, "*") != 0)
+ {
+ rawname = pstrdup(synchronize_slot_names);
+ SplitIdentifierString(rawname, ',', &sync_slot_names_list);
+ }
+}

28a.
Why this static function name is camel-case, unlike all the others?

~

28b.
What about when the sync_slot_names_list changes from value to "" or
"*". Shouldn't this function be setting sync_slot_names_list = NIL for
that scenario?

~~~

29. remote_connect

+/*
+ * Connect to remote (primary) server.
+ *
+ * This uses primary_conninfo in order to connect to primary. For slot-sync
+ * to work, primary_conninfo is expected to have dbname as well.
+ */
+static WalReceiverConn *
+remote_connect()

29a.
I felt it might be more helpful to say "GUC primary_conninfo" instead
of just 'primary_conninfo' the first time this is mentioned.

~

29b.
/connect to primary/connect to the primary/

~

29c.
/is expected to have/is required to specify/

~~~

30. reconnect_if_needed

+/*
+ * Reconnect to remote (primary) server if PrimaryConnInfo got changed.
+ */
+static WalReceiverConn *
+reconnect_if_needed(WalReceiverConn *wrconn_prev, char *conninfo_prev)

/got changed/has changed/

~~~

31.
+static WalReceiverConn *
+reconnect_if_needed(WalReceiverConn *wrconn_prev, char *conninfo_prev)
+{
+ WalReceiverConn *wrconn = NULL;
+
+ /* If no change in PrimaryConnInfo, return previous connection itself */
+ if (strcmp(conninfo_prev, PrimaryConnInfo) == 0)
+ return wrconn_prev;
+
+ walrcv_disconnect(wrconn);
+ wrconn = remote_connect();
+ return wrconn;
+}

/return previous/return the previous/

Disconnect NULL is a bug isn't it? Don't you mean to disconnect 'wrconn_prev'?

~~~

32. slotsync_worker_detach

+/*
+ * Detach the worker from DSM and update 'proc' and 'in_use'.
+ * Logical replication launcher will come to know using these
+ * that the worker has shutdown.
+ */
+static void
+slotsync_worker_detach(int code, Datum arg)
+{
+ dsa_detach((dsa_area *) DatumGetPointer(arg));
+ LWLockAcquire(SlotSyncWorkerLock, LW_EXCLUSIVE);
+ MySlotSyncWorker->hdr.in_use = false;
+ MySlotSyncWorker->hdr.proc = NULL;
+ LWLockRelease(SlotSyncWorkerLock);
+}

I expected this function to be in the same module as
slotsync_worker_attach. It seems a bit strange to have them separated.

~~~

33. ReplSlotSyncMain

+ ereport(ERROR,
+ (errmsg("The dbname not specified in primary_conninfo, skipping"
+ " slots synchronization"),
+ errhint("Specify dbname in primary_conninfo for slots"
+ " synchronization to proceed")));

/not specified in/was not specified in/

/slots synchronization/slot synchronization/ (??) -- there are multiple of these

~

34.
+ /*
+ * Connect to the database specified by user in PrimaryConnInfo. We need
+ * database connection for walrcv_exec to work. Please see comments atop
+ * libpqrcv_exec.
+ */

/database connection/a database connection/

~~~

35.
+ /* Reconnect if primary_conninfo got changed */
+ if (config_reloaded)
+ wrconn = reconnect_if_needed(wrconn, conninfo_prev);

SUGGESTION
Reconnect if GUC primary_conninfo has changed.

~

36.
+ /*
+ * The slot-sync worker must not get here because it will only stop when
+ * it receives a SIGINT from the logical replication launcher, or when
+ * there is an error. None of these cases will allow the code to reach
+ * here.
+ */
+ Assert(false);

36a.
/must not/cannot/

36b.
"None of these cases will allow the code to reach here." <-- redundant sentence


======
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Synchronizing slots from primary to standby

From

"Drouvot, Bertrand"

Date:

27 September 2023, 09:43:46

Hi,

On 9/19/23 6:50 AM, shveta malik wrote:
> On Wed, Sep 13, 2023 at 5:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Wed, Sep 13, 2023 at 4:54 PM shveta malik <shveta.malik@gmail.com> wrote:
>>>
>>> PFA  v17. It has below changes:
>>>
>>
>> @@ -2498,6 +2500,13 @@ ReorderBufferProcessTXN(ReorderBuffer *rb,
>> ReorderBufferTXN *txn,
>>    }
>>    else
>>    {
>> + /*
>> + * Before we send out the last set of changes to logical decoding
>> + * output plugin, wait for specified streaming replication standby
>> + * servers (if any) to confirm receipt of WAL upto commit_lsn.
>> + */
>> + WaitForStandbyLSN(commit_lsn);
>>
>> It seems the first patch has a wait logic for every commit. I think it
>> is better to integrate this wait with WalSndWaitForWal() as suggested
>> by Andres in his email[1].
>>
>> [1] - https://www.postgresql.org/message-id/20220207204557.74mgbhowydjco4mh%40alap3.anarazel.de
>>
>> --
> 
> Sure Amit. PFA  v18. It addresses below:
> 
> 1) patch001: wait for physical-standby confirmation logic is now
> integrated with WalSndWaitForWal(). Now walsender waits for physical
> standby's confirmation to take changes upto RecentFlushPtr in
> WalSndWaitForWal(). This allows walsender to send the changes to
> logical subscribers one by one which are already covered in
> RecentFlushPtr without needing to wait on every commit for physical
> standby confirmation.

+       /* XXX: Is waiting for 1 second before retrying enough or more or less? */
+       (void) WaitLatch(MyLatch,
+                                        WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+                                        1000L,
+                                        WAIT_EVENT_WAL_SENDER_WAIT_FOR_STANDBY_CONFIRMATION);

I think it would be better to let the physical walsender(s) wake up those logical
walsender(s) (instead of waiting for 1 sec or such). Maybe we could introduce a new CV that would
broadcast in PhysicalConfirmReceivedLocation() when restart_lsn is changed, what do you think?

Still regarding preventing the logical replication to go ahead of
physical replication standbys specified in standby_slot_names: we currently don't impose this
limitation to pg_logical_slot_get_changes and friends (that don't start a dedicated walsender).

Shouldn't we also prevent them to go ahead of physical replication standbys specified in standby_slot_names?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

27 September 2023, 11:55:59

PFA v20. The changes are:

1) The launcher now checks hot_standby_feedback (along with presence
of physical slot) before launching slot-sync workers and skips the
sync if it is off.

2) Other validity checks (for primary_slot_name, dbname in
primary_conn_info etc) are now moved to the launcher before we even
launch slot-sync workers. This will fix frequent WARNING msg coming in
log file as reported by Bertrand.

3) Now we stop all the slot-sync workers in case any of the related
GUCs has changed and then relaunch these in next sync-cycle as per new
values and after performing validity checks again.

4) This patch also fixes few bugs in wait_for_primary_slot_catchup():
4.1) This function was not coming out of wait gracefully on standby's
promotion, it is fixed now.
4.2) The checks to start the wait were not correct. These have been fixed now
4.3) If the slot (on which we are waiting) is invalidated on primary
meanwhile, this function was not handling that scenario and was not
aborting the wait. Handled now.

5) Addressed most of the comments(dated Sep25) given by Kruoda-san in
patch 0001.

First 4 changes are in patch002 while last one is in patch001.

thanks
Shveta

Attachment

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

27 September 2023, 12:03:19

On Mon, Sep 25, 2023 at 7:46 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Ajin, Shveta,
>
> Thank you for rebasing the patch set! Here are new comments for v19_2-0001.
>

Thank You Kuroda-san for the feedback. Most of these are addressed in
v20. Please find my response inline.

> 01. WalSndWaitForStandbyNeeded()
>
> ```
>         if (SlotIsPhysical(MyReplicationSlot))
>                 return false;
> ```
>
> Is there a possibility that physical walsenders call this function?
> IIUC following is a stacktrace for the function, so the only logical walsenders use it.
> If so, it should be Assert() instead of an if statement.
>
> logical_read_xlog_page()
> WalSndWaitForWal()
> WalSndWaitForStandbyNeeded()

It will only be called from logical-walsenders. Modified as you suggested.

>
> 02. WalSndWaitForStandbyNeeded()
>
> Can we set shouldwait in SlotSyncInitConfig()? synchronize_slot_names_list is
> searched whenever the function is called, but it is not changed automatically.
> If the slotname is compared with the list in the SlotSyncInitConfig(), the
> liner search can be reduced.

standby_slot_names and synchronize_slot_names will be removed in the
next version as per discussion in [1] and thus SlotSyncInitConfig()
will not be needed. It will be replaced by new functionality. So I am
currently leaving it as is.

>
> 03. WalSndWaitForStandbyConfirmation()
>
> We should add ProcessRepliesIfAny() during the loop, otherwise the walsender
> overlooks the death of an apply worker.
>

Done.

> 04. WalSndWaitForStandbyConfirmation()
>
> Not sure, but do we have to return early if walsenders got PROCSIG_WALSND_INIT_STOPPING
> signal? I thought that if physical walsenders get stuck, logical walsenders wait
> forever. At that time we cannot stop the primary server even if "pg_ctl stop"
> is executed.
>

yes, right.  I have added CHECK_FOR_INTERRUPTS() and 'got_STOPPING'
handling now which I think should suffice to process
PROCSIG_WALSND_INIT_STOPPING.

> 05. SlotSyncInitConfig()
>
> Why don't we free the memory for rawname, old standby_slot_names_list, and synchronize_slot_names_list?
> They seem to be overwritten.
>

Skipped for the time being due to reasons stated in pt 2.

> 06. SlotSyncInitConfig()
>
> Both physical and logical walsenders call the func, but physical one do not use
> lists, right? If so, can we add a quick exit for physical walsenders?
> Or, we should carefully remove where physical calls it.
>
> 07. StartReplication()
>
> I think we do not have to call SlotSyncInitConfig().
> Alternative approach is written in above.
>

I have removed it  from StartReplication()

> 08. the other
>
> Also, I found the unexpected behavior after both 0001 and 0002 were applied.
> Was it normal or not?
>
> 1. constructed below setup
> (ensured that logical slot existed on secondary)
> 2. stopped the primary
> 3. promoted the secondary server
> 4. disabled a subscription once
> 5. changed the connection string for subscriber
> 6. Inserted data to new primary
> 7. enabled the subscription again
> 8. got an ERROR: replication slot "sub" does not exist
>
> I expected that the logical replication would be restarted, but it could not.
> Was it real issue or my fault? The error would appear in secondary.log.
>
> ```
> Setup:
> primary--->secondary
>    |
>    |
> subscriber
> ```

I have attached the new test-script (v2), can you please try that on
the v20 set of patches. We should let the slot creation complete first
on standby and then try promotion. I have added a few extra lines in
v2 of your script for the same. In the test-case, primary's
restart-lsn was lagging behind
new-slot's restart_lsn on standby and thus standby was waiting for
primary to catch-up. Meanwhile standby got promoted and thus slot
creation got aborted. That is the reason you were not able to get the
logical replication working on the new primary. v20 has improved
handling and better logging for such a case. Please try the attached
test-script on v20.

[1]: https://www.postgresql.org/message-id/CAJpy0uA%2Bt3XP2M0qtEmrOG1gSwHghjHPno5AtwTXM-94-%2Bc6JQ%40mail.gmail.com

thanks
Shveta

Attachment

test_0925_v2.sh

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

28 September 2023, 04:51:38

On Wed, Sep 27, 2023 at 3:13 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> On 9/19/23 6:50 AM, shveta malik wrote:
> >
> > 1) patch001: wait for physical-standby confirmation logic is now
> > integrated with WalSndWaitForWal(). Now walsender waits for physical
> > standby's confirmation to take changes upto RecentFlushPtr in
> > WalSndWaitForWal(). This allows walsender to send the changes to
> > logical subscribers one by one which are already covered in
> > RecentFlushPtr without needing to wait on every commit for physical
> > standby confirmation.
>
> +       /* XXX: Is waiting for 1 second before retrying enough or more or less? */
> +       (void) WaitLatch(MyLatch,
> +                                        WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
> +                                        1000L,
> +                                        WAIT_EVENT_WAL_SENDER_WAIT_FOR_STANDBY_CONFIRMATION);
>
> I think it would be better to let the physical walsender(s) wake up those logical
> walsender(s) (instead of waiting for 1 sec or such). Maybe we could introduce a new CV that would
> broadcast in PhysicalConfirmReceivedLocation() when restart_lsn is changed, what do you think?
>

Yes, I also think there should be some way for physical walsender to
wake up logical walsenders instead of just waiting. By the way, do you
think we need a GUC like standby_slot_names (please see discussion
[1])?

> Still regarding preventing the logical replication to go ahead of
> physical replication standbys specified in standby_slot_names: we currently don't impose this
> limitation to pg_logical_slot_get_changes and friends (that don't start a dedicated walsender).
>
> Shouldn't we also prevent them to go ahead of physical replication standbys specified in standby_slot_names?
>

Yes, I also think similar handling is required in
pg_logical_slot_get_changes_guts(). We do call GetFlushRecPtr(), so
the handling similar to what the patch is trying to do in
WalSndWaitForWal() can be done.

[1] - https://www.postgresql.org/message-id/CAJpy0uA%2Bt3XP2M0qtEmrOG1gSwHghjHPno5AtwTXM-94-%2Bc6JQ%40mail.gmail.com

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

"Drouvot, Bertrand"

Date:

28 September 2023, 13:01:42

Hi,

On 9/25/23 6:10 AM, shveta malik wrote:
> On Fri, Sep 22, 2023 at 3:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Thu, Sep 21, 2023 at 9:16 AM shveta malik <shveta.malik@gmail.com> wrote:
>>>
>>> On Tue, Sep 19, 2023 at 10:29 AM shveta malik <shveta.malik@gmail.com> wrote:
>>>
>>> Currently in patch001, synchronize_slot_names is a GUC on both primary
>>> and physical standby. This GUC tells which all logical slots need to
>>> be synced on physical standbys from the primary. Ideally it should be
>>> a GUC on physical standby alone and each physical standby should be
>>> able to communicate the value to the primary (considering the value
>>> may vary for different physical replicas of the same primary). The
>>> primary on the other hand should be able to take UNION of these values
>>> and let the logical walsenders (belonging to the slots in UNION
>>> synchronize_slots_names) wait for physical standbys for confirmation
>>> before sending those changes to logical subscribers. The intent is
>>> logical subscribers should never be ahead of physical standbys.
>>>
>>
>> Before getting into the details of 'synchronize_slot_names', I would
>> like to know whether we really need the second GUC
>> 'standby_slot_names'. Can't we simply allow all the logical wal
>> senders corresponding to 'synchronize_slot_names' to wait for just the
>> physical standby(s) (physical slot corresponding to such physical
>> standby) that have sent ' synchronize_slot_names'list? We should have
>> one physical standby slot corresponding to one physical standby.
>>
> 
> yes, with the new approach (to be implemented next) where we plan to
> send synchronize_slot_names from each physical standby to primary, the
> standby_slot_names GUC should no longer be needed on primary. The
> physical standbys sending requests should automatically become the
> ones to be waited for confirmation on the primary.
> 

I think that standby_slot_names could be used to do some filtering (means
for which standby(s) we don't want the logical replication on the primary to go
ahead and for which standby(s) one would allow it).

I think that removing the GUC would:

- remove this flexibility
- probably open corner cases like: what if a standby is down? would that mean
that synchronize_slot_names not being send to the primary would allow the decoding
on the primary to go ahead?

So, I'm not sure we should remove this GUC.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

29 September 2023, 11:33:36

On Thu, Sep 28, 2023 at 6:31 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> On 9/25/23 6:10 AM, shveta malik wrote:
> > On Fri, Sep 22, 2023 at 3:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>
> >> On Thu, Sep 21, 2023 at 9:16 AM shveta malik <shveta.malik@gmail.com> wrote:
> >>>
> >>> On Tue, Sep 19, 2023 at 10:29 AM shveta malik <shveta.malik@gmail.com> wrote:
> >>>
> >>> Currently in patch001, synchronize_slot_names is a GUC on both primary
> >>> and physical standby. This GUC tells which all logical slots need to
> >>> be synced on physical standbys from the primary. Ideally it should be
> >>> a GUC on physical standby alone and each physical standby should be
> >>> able to communicate the value to the primary (considering the value
> >>> may vary for different physical replicas of the same primary). The
> >>> primary on the other hand should be able to take UNION of these values
> >>> and let the logical walsenders (belonging to the slots in UNION
> >>> synchronize_slots_names) wait for physical standbys for confirmation
> >>> before sending those changes to logical subscribers. The intent is
> >>> logical subscribers should never be ahead of physical standbys.
> >>>
> >>
> >> Before getting into the details of 'synchronize_slot_names', I would
> >> like to know whether we really need the second GUC
> >> 'standby_slot_names'. Can't we simply allow all the logical wal
> >> senders corresponding to 'synchronize_slot_names' to wait for just the
> >> physical standby(s) (physical slot corresponding to such physical
> >> standby) that have sent ' synchronize_slot_names'list? We should have
> >> one physical standby slot corresponding to one physical standby.
> >>
> >
> > yes, with the new approach (to be implemented next) where we plan to
> > send synchronize_slot_names from each physical standby to primary, the
> > standby_slot_names GUC should no longer be needed on primary. The
> > physical standbys sending requests should automatically become the
> > ones to be waited for confirmation on the primary.
> >
>
> I think that standby_slot_names could be used to do some filtering (means
> for which standby(s) we don't want the logical replication on the primary to go
> ahead and for which standby(s) one would allow it).
>

Isn't it implicit that the physical standby that has requested
'synchronize_slot_names' should be ahead of their corresponding
logical walsenders? Otherwise, after the switchover to the new
physical standby, the logical subscriptions won't work.

> I think that removing the GUC would:
>
> - remove this flexibility
>

I think if required we can add such a GUC later as well. Asking users
to set more parameters also makes the feature less attractive, so I am
trying to see if we can avoid this GUC.

> - probably open corner cases like: what if a standby is down? would that mean
> that synchronize_slot_names not being send to the primary would allow the decoding
> on the primary to go ahead?
>

Good question. BTW, irrespective of whether we have
'standby_slot_names' parameters or not, how should we behave if
standby is down? Say, if 'synchronize_slot_names' is only specified on
standby then in such a situation primary won't be even aware that some
of the logical walsenders need to wait. OTOH, one can say that users
should configure 'synchronize_slot_names' on both primary and standby
but note that this value could be different for different standby's,
so we can't configure it on primary.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

"Drouvot, Bertrand"

Date:

02 October 2023, 06:09:01

Hi,

On 9/29/23 1:33 PM, Amit Kapila wrote:
> On Thu, Sep 28, 2023 at 6:31 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
>>
>>
>> I think that standby_slot_names could be used to do some filtering (means
>> for which standby(s) we don't want the logical replication on the primary to go
>> ahead and for which standby(s) one would allow it).
>>
> 
> Isn't it implicit that the physical standby that has requested
> 'synchronize_slot_names' should be ahead of their corresponding
> logical walsenders? Otherwise, after the switchover to the new
> physical standby, the logical subscriptions won't work.

Right, but the idea was to let the flexibility to bypass this constraint. Use
case was to avoid a physical standby being down preventing the decoding
on the primary.

> 
>> I think that removing the GUC would:
>>
>> - remove this flexibility
>>
> 
> I think if required we can add such a GUC later as well. Asking users
> to set more parameters also makes the feature less attractive, so I am
> trying to see if we can avoid this GUC.

Agree but I think we have to address the standby being down case.
> 
>> - probably open corner cases like: what if a standby is down? would that mean
>> that synchronize_slot_names not being send to the primary would allow the decoding
>> on the primary to go ahead?
>>
> 
> Good question. BTW, irrespective of whether we have
> 'standby_slot_names' parameters or not, how should we behave if
> standby is down? Say, if 'synchronize_slot_names' is only specified on
> standby then in such a situation primary won't be even aware that some
> of the logical walsenders need to wait.

Exactly, that's why I was thinking keeping standby_slot_names to address
this scenario. In such a case one could simply decide to keep or remove
the associated physical replication slot from standby_slot_names. Keep would
mean "wait" and removing would mean allow to decode on the primary.

> OTOH, one can say that users
> should configure 'synchronize_slot_names' on both primary and standby
> but note that this value could be different for different standby's,
> so we can't configure it on primary.
> 

Yeah, I think that's a good use case for standby_slot_names, what do you think?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

From

"Hayato Kuroda (Fujitsu)"

Date:

02 October 2023, 10:59:06

Dear Shveta,

Thank you for updating the patch!

I found another ERROR due to the slot removal. Is this a real issue?

1. applied add_sleep.txt, which emulated the case the tablesync worker stucked
   and the primary crashed during the
   initial sync.
2. executed test_0925_v2.sh (You attached in [1])
3. secondary could not start the logical replication because the slot was not
   created (log files were also attached).


Here is my analysis. The cause is that the slotsync worker aborts the slot creation
on secondary server because the restart_lsn of secondary ahead the primary's one.
IIUC it can be occurred when tablesync workers finishes initial copy before
walsenders stream changes. In this case, the relstate of the worker is set to
SUBREL_STATE_CATCHUP and the apply worker waits till the relation becomes
SUBREL_STATE_SYNCDONE. From here the slot on primary will not be updated until
the relation is caught up. If some changes are come and the primary crashes at
that time, the syncslot worker will abort the slot creation.


Anyway, followings are my comments.
I have not checked detailed conventions yet. It should be done in later stage.


~~~~~~~~~~~~~~~~

For 0001:
===

WalSndWaitForStandbyConfirmation()

```
+               /* If postmaster asked us to stop, don't wait anymore */
+               if (got_STOPPING)
+                       break;
```

I have considered again, and it may still have an issue: logical walsenders may
break from the loop before physical walsenders send WALs. This may be occurred
because both physical and logical walsenders would get PROCSIG_WALSND_INIT_STOPPING.

I think a function like WalSndWaitStopping() must be needed, which waits until
physical walsenders become WALSNDSTATE_STOPPING or exit. Thought?

WalSndWaitForStandbyConfirmation()

```
+       standby_slot_cpy = list_copy(standby_slot_names_list);
```

I found that standby_slot_names_list and standby_slot_cpy would not be updated
even if the GUC was updated. Is this acceptable? Won't it be occurred after you
refactor the patch?
What would be occurred when synchronize_slot_names is updated on secondary
while primary executes this?

WalSndWaitForStandbyConfirmation()

```
+
+       goto retry;
```

I checked other "goto retry;", but I could not find the pattern that the return
clause does not exist after the goto (exception: void function). I also think
that current style seems a bit strange. How about using an outer loop like
While (list_length(standby_slot_cpy))?

=====

slot.h

```
+extern void WaitForStandbyLSN(XLogRecPtr wait_for_lsn);
```

WaitForStandbyLSN() does not exist.

~~~~~~~~~~~~~~~~

For 0002:
=====

General

The patch requires that primary_conninfo must contain the dbname, but it
conflicts with documentation. It says:

```
...Do not specify a database name in the primary_conninfo string.
```

I confirmed [^a] it is harmless that primary_conninfo has dbname, but at least
the description must be fixed.

General

I found that primary server output huge amount of logs when the log_min_duration_messages = 0.
This ie because slotsync worker sends an SQL per 10ms, in wait_for_primary_slot_catchup().
Is there any good way to suppress it? Or, should we be patient?

=====

```
+{ oid => '6312', descr => 'what caused the replication slot to become invalid',
```

How did you determine the oid? IIRC, developping features should use oids in
the range 8000-9999. See src/include/catalog/unused_oids.

=====

LogicalRepCtxStruct

```
        /* Background workers. */
+       SlotSyncWorker *ss_workers; /* slot-sync workers */
        LogicalRepWorker workers[FLEXIBLE_ARRAY_MEMBER];
```

It's OK for now, but can we combine them into an array? IIUC there is no
possibility to exist both of processes and they have same component, so it may
be able to be same. It can reduce an attribute but may lead some
difficulties to read.

WaitForReplicationWorkerAttach() and logicalrep_worker_stop_internal()

I could not find cases that has "LWLock *" as an argument (exception: functions in lwlock.c).
Is it sufficient to check RecoveryInProgress() instead of specifying as arguments?

=====

wait_for_primary_slot_catchup()

```
+               /* Check if this standby is promoted while we are waiting */
+               if (!RecoveryInProgress())
+               {
+                       /*
+                        * The remote slot didn't pass the locally reserved position at
+                        * the time of local promotion, so it's not safe to use.
+                        */
+                       ereport(
+                                       WARNING,
+                                       (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                        errmsg(
+                                                       "slot-sync wait for slot %s interrupted by promotion, "
+                                                       "slot creation aborted", remote_slot->name)));
+                       pfree(cmd.data);
+                       return false;
+               }
```

The part would not be executed if the promote signal is sent after the primary
server crashes. I think walrcv_exec() will detect the failure first. 
The function must be wrapped by PG_TRY() and the message must be emitted in
PG_CATCH(). There may be other approaches.

wait_for_primary_slot_catchup()

```
+               rc = WaitLatch(MyLatch,
+                                          WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+                                          WORKER_DEFAULT_NAPTIME_MS,
+                                          WAIT_EVENT_REPL_SLOTSYNC_MAIN);
```

New wait event can be added.


[1]: https://www.postgresql.org/message-id/CAJpy0uDD%2B9aJnDx9fBfvLvxJtxA7qqoAys4fo6h1tq1b_0_A7Q%40mail.gmail.com
[^a]

Regarding the secondary side, the libpqrcv_connect() does not do special things
even if the primary_conninfo has dbname="XXX". It adds parameters like
"replication=true" and sends a startup packet.

As for the primary side,  the startup packet is consumed in ProcessStartupPacket().
It checks whether the process should be a walsender or not (line 2204).

Then (line 2290) the port->database_name[0] is set as '\0' in case of walsender.
The value is used for setting the process title in BackendInitialize().

Also, InitPostgres() really sets some global variables like MyDatabaseId,
but it is not occurred when the process is walsender.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

03 October 2023, 10:54:48

On Mon, Oct 2, 2023 at 11:39 AM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> On 9/29/23 1:33 PM, Amit Kapila wrote:
> > On Thu, Sep 28, 2023 at 6:31 PM Drouvot, Bertrand
> > <bertranddrouvot.pg@gmail.com> wrote:
> >>
> >
> >> - probably open corner cases like: what if a standby is down? would that mean
> >> that synchronize_slot_names not being send to the primary would allow the decoding
> >> on the primary to go ahead?
> >>
> >
> > Good question. BTW, irrespective of whether we have
> > 'standby_slot_names' parameters or not, how should we behave if
> > standby is down? Say, if 'synchronize_slot_names' is only specified on
> > standby then in such a situation primary won't be even aware that some
> > of the logical walsenders need to wait.
>
> Exactly, that's why I was thinking keeping standby_slot_names to address
> this scenario. In such a case one could simply decide to keep or remove
> the associated physical replication slot from standby_slot_names. Keep would
> mean "wait" and removing would mean allow to decode on the primary.
>
> > OTOH, one can say that users
> > should configure 'synchronize_slot_names' on both primary and standby
> > but note that this value could be different for different standby's,
> > so we can't configure it on primary.
> >
>
> Yeah, I think that's a good use case for standby_slot_names, what do you think?
>

But, even if we keep 'standby_slot_names' for this purpose, the
primary doesn't know the value of 'synchronize_slot_names' once the
standby is down and or the primary is restarted. So, how will we know
which logical WAL senders needs to wait for 'standby_slot_names'?


--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Hayato Kuroda (Fujitsu)"

Date:

03 October 2023, 13:42:15

Dear Shveta,

While investigating more, I found that the launcher crashes while executing the
script. Please see attached one.

In this script, the subscriber was also the publisher. Both subscriber and
subscriber2 referred the same replication slot, which was synchronized by slotsync
worker. I was quite not sure the synchronization should be occurred in this case,
but at lease core must not be dumped. The secondary server crashed.

primary ---> secondary
     |           |
subscriber    subscriber2

I checked the stack trace and found that the apply worker crashed.

```
(gdb) bt
#0  0x0000000000b310a9 in check_for_freed_segments (area=0x3a4ec68) at ../postgres/src/backend/utils/mmgr/dsa.c:2248
#1  0x0000000000b2e856 in dsa_get_address (area=0x3a4ec68, dp=16384) at ../postgres/src/backend/utils/mmgr/dsa.c:959
#2  0x00000000008a2bb5 in slotsync_remove_obsolete_dbs (remote_dbs=0x1fcea70)
    at ../postgres/src/backend/replication/logical/launcher.c:1615
#3  0x00000000008a318d in ApplyLauncherStartSlotSync (wait_time=0x7ffe15cd57a8, wrconn=0x1f82ec0)
    at ../postgres/src/backend/replication/logical/launcher.c:1799
#4  0x00000000008a3667 in ApplyLauncherMain (main_arg=0) at
../postgres/src/backend/replication/logical/launcher.c:1967
#5  0x0000000000863aef in StartBackgroundWorker () at ../postgres/src/backend/postmaster/bgworker.c:867
#6  0x000000000086e260 in do_start_bgworker (rw=0x1f6b4e0) at ../postgres/src/backend/postmaster/postmaster.c:5740
#7  0x000000000086e649 in maybe_start_bgworkers () at ../postgres/src/backend/postmaster/postmaster.c:5964
#8  0x000000000086953d in ServerLoop () at ../postgres/src/backend/postmaster/postmaster.c:1852
#9  0x0000000000868c42 in PostmasterMain (argc=3, argv=0x1f3e240) at
../postgres/src/backend/postmaster/postmaster.c:1465
#10 0x000000000075ad5f in main (argc=3, argv=0x1f3e240) at ../postgres/src/backend/main/main.c:198
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

On Wed, Sep 27, 2023 at 2:37 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some more review comments for the patch v19-0002.
>
> This is a WIP.... these review comments are all for the file slotsync.c
>
> ======
> src/backend/replication/logical/slotsync.c
>
> 1. wait_for_primary_slot_catchup
>
> + WalRcvExecResult *res;
> + TupleTableSlot *slot;
> + Oid slotRow[1] = {LSNOID};
> + StringInfoData cmd;
> + bool isnull;
> + XLogRecPtr restart_lsn;
> +
> + for (;;)
> + {
> + int rc;
>
> I could not recognize a reason why 'rc' is declared within the loop,
> but none of the other local variables are. Personally, I'd declare all
> variables at the deepest scope (e.g. inside the for loop).
>

fixed.
> ~~~
>
> 2. get_local_synced_slot_names
>
> +/*
> + * Get list of local logical slot names which are synchronized from
> + * primary and belongs to one of the DBs passed in.
> + */
> +static List *
> +get_local_synced_slot_names(Oid *dbids)
> +{
>
> IIUC, this function gets called only from the drop_obsolete_slots()
> function. But I thought this list of local slot names (i.e. for the
> dbids that this worker is handling) would be something that perhaps
> could the initialized one time for the worker, instead of it being
> re-calculated every single time the slots processing/dropping happens.
> Isn't the current code expending too much effort recalculating over
> and over but giving back the same list every time?
>

The reason this is being done is because the dblist could be changed at any time
by the launcher, which requires us to recalculate the list of slots
specific to each workers dblist.

> ~~~
>
> 3. get_local_synced_slot_names
>
> + for (int i = 0; i < max_replication_slots; i++)
> + {
> + ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
> +
> + /* Check if it is logical synchronized slot */
> + if (s->in_use && SlotIsLogical(s) && s->data.synced)
> + {
> + for (int j = 0; j < MySlotSyncWorker->dbcount; j++)
> + {
>
> Loop variables are not declared in the common PG code way.
>

fixed.

> ~~~
>
> 4. slot_exists_locally
>
> +static bool
> +slot_exists_locally(List *remote_slots, ReplicationSlot *local_slot,
> + bool *locally_invalidated)
> +{
> + ListCell   *cell;
> +
> + foreach(cell, remote_slots)
> + {
> + RemoteSlot *remote_slot = (RemoteSlot *) lfirst(cell);
> +
> + if (strcmp(remote_slot->name, NameStr(local_slot->data.name)) == 0)
> + {
> + /*
> + * if remote slot is marked as non-conflicting (i.e. not
> + * invalidated) but local slot is marked as invalidated, then set
> + * the bool.
> + */
> + if (!remote_slot->conflicting &&
> + SlotIsLogical(local_slot) &&
> + local_slot->data.invalidated != RS_INVAL_NONE)
> + *locally_invalidated = true;
> +
> + return true;
> + }
> + }
> +
> + return false;
> +}
>
> Why is there a SlotIsLogical(local_slot) check buried in this
> function? How is slot_exists_locally() getting called with a
> non-logical local_slot? Shouldn't that have been screened out long
> before here?
>

Removed that because it is redundant.

> ~~~
>
> 5. use_slot_in_query
>
> +static bool
> +use_slot_in_query(char *slot_name, Oid *dbids)
>
> There are multiple non-standard for-loop variable declarations in this function.
>

fixed.

> ~~~
>
> 6. compute_naptime
>
> + * The first slot managed by each worker is chosen for monitoring purpose.
> + * If the lsn of that slot changes during each sync-check time, then the
> + * nap time is kept at regular value of WORKER_DEFAULT_NAPTIME_MS.
> + * When no lsn change is observed for WORKER_INACTIVITY_THRESHOLD_MS
> + * time, then the nap time is increased to WORKER_INACTIVITY_NAPTIME_MS.
> + * This nap time is brought back to WORKER_DEFAULT_NAPTIME_MS as soon as
> + * lsn change is observed.
>
> 6a.
> /regular value/the regular value/
>
> /for WORKER_INACTIVITY_THRESHOLD_MS time/within the threshold period
> (WORKER_INACTIVITY_THRESHOLD_MS)/
>

Fixed.

> ~
>
> 6b.
> /as soon as lsn change is observed./as soon as another lsn change is observed./
>

fixed.

> ~~~
>
> 7.
> + * The caller is supposed to ignore return-value of 0. The 0 value is returned
> + * for the slots other that slot being monitored.
> + */
> +static long
> +compute_naptime(RemoteSlot *remote_slot)
>
> This rule about the returning 0 seemed hacky to me. IMO this would be
> a better API to pass long *naptime (which this function either updates
> or doesn't update, depending on this being the "monitored" slot.
> Knowing the current naptime is also useful to improve the function
> logic (see the next review comment below).
>
> Also, since this function is really only toggling naptime between 2
> values, it would be helpful to assert that
>
> Assert(*naptime == WORKER_DEFAULT_NAPTIME_MS || *naptime ==
> WORKER_INACTIVITY_NAPTIME_MS);
>

fixed.

> ~~~
>
> 8.
> + if (NameStr(MySlotSyncWorker->monitoring_info.slot_name)[0] == '\0')
> + {
> + /*
> + * First time, just update the name and lsn and return regular
> + * nap time. Start comparison from next time onward.
> + */
> + strcpy(NameStr(MySlotSyncWorker->monitoring_info.slot_name),
> +    remote_slot->name);
>
> I wasn't sure why it was necessary to identify the "monitoring" slot
> by name. Why doesn't the compute_naptime just get called only for the
> 1st slot found in the tuple loop instead of all the strcmp business
> trying to match monitor names?
>
> And, if the monitored slot gets "dropped", then so what; next time
> another slot will be the first tuple so will automatically take its
> place, right?
>

Yes, that is correct. Fixed as commented.

> ~~~
>
> 9.
> + /*
> + * If new received lsn (remote one) is different from what we have in
> + * our local slot, then update last_update_time.
> + */
> + if (MySlotSyncWorker->monitoring_info.confirmed_lsn !=
> + remote_slot->confirmed_lsn)
> + MySlotSyncWorker->monitoring_info.last_update_time = now;
> +
> + MySlotSyncWorker->monitoring_info.confirmed_lsn =
> + remote_slot->confirmed_lsn;
>
> Doesn't it make more sense to also put that 'confirmed_lsn' assignment
> under the same condition? e.g. No need to overwrite the same value
> again.
>

Fixed.

> ~~~
>
> 10.
> + /* If the inactivity time reaches the threshold, increase nap time */
> + if (TimestampDifferenceExceeds(MySlotSyncWorker->monitoring_info.last_update_time,
> +    now, WORKER_INACTIVITY_THRESHOLD_MS))
> + return WORKER_INACTIVITY_NAPTIME_MS;
> + else
> + return WORKER_DEFAULT_NAPTIME_MS;
> + }
>
> Somehow this feels overcomplicated to me.
>
> In reality, the naptime is only toggling between 2 values (DEFAULT and
> INACTIVITY) so we should never need to be testing
> TimestampDifferenceExceeds again and again on subsequent calls (there
> might be 1000s of them)
>
> Once naptime is WORKER_INACTIVITY_NAPTIME_MS we know to reset it back
> to WORKER_DEFAULT_NAPTIME_MS only if
> (MySlotSyncWorker->monitoring_info.confirmed_lsn !=
> remote_slot->confirmed_lsn) is detected.
>
> Basically, I think the algorithm should be like the code below:
>
> TimestampTz now = GetCurrentTimestamp();
>
> if (MySlotSyncWorker->monitoring_info.confirmed_lsn !=
> remote_slot->confirmed_lsn)
> {
>   MySlotSyncWorker->monitoring_info.last_update_time = now;
>   MySlotSyncWorker->monitoring_info.confirmed_lsn = remote_slot->confirmed_lsn;
>
>   /* Something changed; reset naptime to default. */
>   *naptime = WORKER_DEFAULT_NAPTIME_MS;
> }
> else
> {
>   if (*naptime == WORKER_DEFAULT_NAPTIME_MS)
>   {
>     /* If the inactivity time reaches the threshold, increase nap time. */
>     if (TimestampDifferenceExceeds(MySlotSyncWorker->monitoring_info.last_update_time,
> now, WORKER_INACTIVITY_THRESHOLD_MS))
>       *naptime = WORKER_INACTIVITY_NAPTIME_MS;
>   }
> }
>

Fixed as suggested.

> ~~~
>
> 11. get_remote_invalidation_cause
>
> +/*
> + * Get Remote Slot's invalidation cause.
> + *
> + * This gets invalidation cause of remote slot.
> + */
> +static ReplicationSlotInvalidationCause
> +get_remote_invalidation_cause(WalReceiverConn *wrconn, char *slot_name)
> +{
>
> Isn't that function comment just repeating itself?
>

Fixed.

> ~~~
>
> 12.
> + initStringInfo(&cmd);
> + appendStringInfo(&cmd,
> + "select pg_get_slot_invalidation_cause(%s)",
> + quote_literal_cstr(slot_name));
>
> Use uppercase "SELECT" for consistency with other SQL.
>

Fixed.

> ~~~
>
> 13.
> + /* Make things live outside TX context */
> + MemoryContextSwitchTo(oldctx);
> +
> + initStringInfo(&cmd);
> + appendStringInfo(&cmd,
> + "select pg_get_slot_invalidation_cause(%s)",
> + quote_literal_cstr(slot_name));
> + res = walrcv_exec(wrconn, cmd.data, 1, slotRow);
> + pfree(cmd.data);
> +
> + CommitTransactionCommand();
> +
> + /* Switch to oldctx we saved */
> + MemoryContextSwitchTo(oldctx);
>
> There are 2x MemoryContextSwitchTo(oldctx) here. Is that deliberate?
>

Yes, that is required as both start transaction and commit transaction
could change memory
context.

> ~~~
>
> 14.
> + if (res->status != WALRCV_OK_TUPLES)
> + ereport(ERROR,
> + (errmsg("could not fetch invalidation cuase for slot \"%s\" from"
> + " primary: %s", slot_name, res->err)));
>
> typo /cuase/cause/
>

fixed.

> ~~~
>
> 15.
> + slot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
> + if (!tuplestore_gettupleslot(res->tuplestore, true, false, slot))
> + ereport(ERROR,
> + (errmsg("slot \"%s\" disapeared from the primary",
> + slot_name)));
>
> typo /disapeared/disappeared/
>
> ~~~
>
>
> 16. drop_obsolete_slots
>
> +/*
> + * Drop obsolete slots
> + *
> + * Drop the slots which no longer need to be synced i.e. these either
> + * do not exist on primary or are no longer part of synchronize_slot_names.
> + *
> + * Also drop the slots which are valid on primary and got invalidated
> + * on standby due to conflict (say required rows removed on primary).
> + * The assumption is, these will get recreated in next sync-cycle and
> + * it is okay to drop and recreate such slots as long as these are not
> + * consumable on standby (which is the case currently).
> + */
>
> /which no/that no/
>
> /which are/that are/
>
> /these will/that these will/
>
> /and got invalidated/that got invalidated/
>

Fixed.

> ~~~
>
> 17.
> + /* If this slot is being monitored, clean-up the monitoring info */
> + if (strcmp(NameStr(local_slot->data.name),
> +    NameStr(MySlotSyncWorker->monitoring_info.slot_name)) == 0)
> + {
> + MemSet(NameStr(MySlotSyncWorker->monitoring_info.slot_name), 0, NAMEDATALEN);
> + MySlotSyncWorker->monitoring_info.confirmed_lsn = 0;
> + MySlotSyncWorker->monitoring_info.last_update_time = 0;
> + }
>
> Maybe it is better to assign InvalidXLogRecPtr instead of 0 to the cleared lsn.
>

Removed this as the slot_name is no longer required in this structure.

> ~
>
> Alternatively, consider just zapping the entire monitoring_info
> structure in one go:
> MemSet(&MySlotSyncWorker->monitoring_info, 0,
> sizeof(MySlotSyncWorker->monitoring_info));
>

Code removed.

> ~~~
>
> 18. construct_slot_query (calling use_slot_in_query)
>
> This separation of functions (use_slot_in_query /
> construct_slot_query) seems awkward to me. The use_slot_in_query()
> function is only called by construct_slot_query(). I felt it might be
> simpler to keep all the logical with the construct_slot_query().
>
> Furthermore, it seemed strange to iterate all the DBs (to populate the
> "WHERE database IN" clause) and then iterate all the DBs multiple
> times again in use_slot_in_query (looking for slots to populate the
> "AND slot_name IN (" clause).
>
> Maybe I misunderstand the reason for this structuring, but IMO it
> would be simpler code to keep all the logic in construct_slot_query()
> like:
>
> a. Initialize with empty dblist, empty slotlist.
> b. Iterate all dbids
> - constructing the dblist as you go
> - constructing the slot list as you go (if synchronize_slot_names is
> not "" or "*")
> c. Finally, build the query: basic + dblist-clause + optional slotlist-clause
>

This I feel will make it more complicated as to get dbid of slot, we need to
search hash, which requires locking, so keeping that seperate.

> ~~~
>
> 19. construct_slot_query
>
> Why does this function return a boolean? I only see it returns true,
> but never false.
>

Fixed.

> ~~~
>
> 20.
> + {
> + ListCell   *lc;
> + bool first_slot = true;
> +
> +
> + foreach(lc, sync_slot_names_list)
>
> Unnecessary blank line.
>
> ~~~
>
> 21. synchronize_one_slot
>
> +/*
> + * Synchronize single slot to given position.
> + *
> + * This creates new slot if there is no existing one and updates the
> + * metadata of existing slots as per the data received from the primary.
> + */
> +static void
> +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
>
> /creates new slot/creates a new slot/
>
> /metadata of existing slots/metadata of the slot/
>
> ~~~
>
> 22
>
> + /* Search for the named slot and mark it active if we find it. */
> + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> + for (int i = 0; i < max_replication_slots; i++)
> + {
> + ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
> +
> + if (!s->in_use)
> + continue;
> +
> + if (strcmp(NameStr(s->data.name), remote_slot->name) == 0)
> + {
> + found = true;
> + break;
> + }
> + }
> + LWLockRelease(ReplicationSlotControlLock);
> 22a.
> "and mark it active if we find it." -- What code here is marking
> anything active?
>
> ~
>
> 22b.
> Uncommon style of loop variable declaration
>

Fixed all above.

> ~
>
> 22c.
> IMO it is over-complicated code; e.g. same loop can be written like this:
>
> SUGGESTION
> for (i = 0; i < max_replication_slots && !found; i++)
> {
>   ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
>
>   if (s->in_use)
>     found = (strcmp(NameStr(s->data.name), remote_slot->name) == 0);
> }
>

Fixed as suggested.

> ~~~
>
> 23. synchronize_slots
>
> + /* Construct query to get slots info from the primary */
> + initStringInfo(&s);
> + if (!construct_slot_query(&s, dbids))
> + {
> + pfree(s.data);
> + CommitTransactionCommand();
> + LWLockRelease(SlotSyncWorkerLock);
> + return naptime;
> + }
>
> As noted elsewhere, it seems construct_slot_query() will never return
> false and so this block of code is unreachable.
>

Removed this code.

> ~~~
>
> 24.
> + /* Create list of remote slot names to be used by drop_obsolete_slots */
> + remote_slot_list = lappend(remote_slot_list, remote_slot);
>
> This is a list of slots, not just slot names.
>

Fixed.

> ~~~
>
> 25.
> + /*
> + * Update nap time in case of non-zero value returned. The zero value
> + * is returned if remote_slot is not the one being monitored.
> + */
> + value = compute_naptime(remote_slot);
> + if (value)
> + naptime = value;
>
> If the compute_naptime API is changed as suggested in a prior review
> comment then this can be simplified to something like:
>
> SUGGESTION:
> /* Update nap time as required depending on slot activity. */
> compute_naptime(remote_slot, &naptime);
>

Fixed.

> ~~~
>
> 26.
> + /*
> + * Drop local slots which no longer need to be synced i.e. these either do
> + * not exist on primary or are no longer part of synchronize_slot_names.
> + */
> + drop_obsolete_slots(dbids, remote_slot_list);
>
> /which no longer/that no longer/
>
> I thought it might be better to omit the "i.e." part. Just leave it to
> the function-header of drop_obsolete_slots for a detailed explanation
> about *which* slots are candidates for dropping.
>

Fixed.

> ~
>
> 27.
> + /* We are done, free remot_slot_list elements */
> + foreach(cell, remote_slot_list)
> + {
> + RemoteSlot *remote_slot = (RemoteSlot *) lfirst(cell);
> +
> + pfree(remote_slot);
> + }
>
> 27a.
> /remot_slot_list/remote_slot_list/
>

Fixed.

> ~
>
> 27b.
> Isn't this just the same as the one-liner:
>
> list_free_deep(remote_slot_list);
>
> ~~~
>
> 28.
> +/*
> + * Initialize the list from raw synchronize_slot_names and cache it, in order
> + * to avoid parsing it repeatedly. Done at slot-sync worker startup and after
> + * each SIGHUP.
> + */
> +static void
> +SlotSyncInitSlotNamesList()
> +{
> + char    *rawname;
> +
> + if (strcmp(synchronize_slot_names, "") != 0 &&
> + strcmp(synchronize_slot_names, "*") != 0)
> + {
> + rawname = pstrdup(synchronize_slot_names);
> + SplitIdentifierString(rawname, ',', &sync_slot_names_list);
> + }
> +}
>
> 28a.
> Why this static function name is camel-case, unlike all the others?
>

Fixed.

> ~
>
> 28b.
> What about when the sync_slot_names_list changes from value to "" or
> "*". Shouldn't this function be setting sync_slot_names_list = NIL for
> that scenario?
>

I modified this logic to free sync_slot_names_list prior to setting
and initializing it to NIL.
> ~~~
>
> 29. remote_connect
>
> +/*
> + * Connect to remote (primary) server.
> + *
> + * This uses primary_conninfo in order to connect to primary. For slot-sync
> + * to work, primary_conninfo is expected to have dbname as well.
> + */
> +static WalReceiverConn *
> +remote_connect()
>
> 29a.
> I felt it might be more helpful to say "GUC primary_conninfo" instead
> of just 'primary_conninfo' the first time this is mentioned.
>

fixed.

> ~
>
> 29b.
> /connect to primary/connect to the primary/
>
> ~
>
> 29c.
> /is expected to have/is required to specify/
>
> ~~~
>
> 30. reconnect_if_needed
>
> +/*
> + * Reconnect to remote (primary) server if PrimaryConnInfo got changed.
> + */
> +static WalReceiverConn *
> +reconnect_if_needed(WalReceiverConn *wrconn_prev, char *conninfo_prev)
>
> /got changed/has changed/
>
> ~~~
>
> 31.
> +static WalReceiverConn *
> +reconnect_if_needed(WalReceiverConn *wrconn_prev, char *conninfo_prev)
> +{
> + WalReceiverConn *wrconn = NULL;
> +
> + /* If no change in PrimaryConnInfo, return previous connection itself */
> + if (strcmp(conninfo_prev, PrimaryConnInfo) == 0)
> + return wrconn_prev;
> +
> + walrcv_disconnect(wrconn);
> + wrconn = remote_connect();
> + return wrconn;
> +}
>
> /return previous/return the previous/
>
> Disconnect NULL is a bug isn't it? Don't you mean to disconnect 'wrconn_prev'?
>

Fixed

> ~~~
>
> 32. slotsync_worker_detach
>
> +/*
> + * Detach the worker from DSM and update 'proc' and 'in_use'.
> + * Logical replication launcher will come to know using these
> + * that the worker has shutdown.
> + */
> +static void
> +slotsync_worker_detach(int code, Datum arg)
> +{
> + dsa_detach((dsa_area *) DatumGetPointer(arg));
> + LWLockAcquire(SlotSyncWorkerLock, LW_EXCLUSIVE);
> + MySlotSyncWorker->hdr.in_use = false;
> + MySlotSyncWorker->hdr.proc = NULL;
> + LWLockRelease(SlotSyncWorkerLock);
> +}
>
> I expected this function to be in the same module as
> slotsync_worker_attach. It seems a bit strange to have them separated.
>

Both now are part of launcher.c file

> ~~~
>
> 33. ReplSlotSyncMain
>
> + ereport(ERROR,
> + (errmsg("The dbname not specified in primary_conninfo, skipping"
> + " slots synchronization"),
> + errhint("Specify dbname in primary_conninfo for slots"
> + " synchronization to proceed")));
>
> /not specified in/was not specified in/
>
> /slots synchronization/slot synchronization/ (??) -- there are multiple of these
>
> ~
>
> 34.
> + /*
> + * Connect to the database specified by user in PrimaryConnInfo. We need
> + * database connection for walrcv_exec to work. Please see comments atop
> + * libpqrcv_exec.
> + */
>
> /database connection/a database connection/
>
> ~~~
>
> 35.
> + /* Reconnect if primary_conninfo got changed */
> + if (config_reloaded)
> + wrconn = reconnect_if_needed(wrconn, conninfo_prev);
>
> SUGGESTION
> Reconnect if GUC primary_conninfo has changed.
>
> ~
>
> 36.
> + /*
> + * The slot-sync worker must not get here because it will only stop when
> + * it receives a SIGINT from the logical replication launcher, or when
> + * there is an error. None of these cases will allow the code to reach
> + * here.
> + */
> + Assert(false);
>
> 36a.
> /must not/cannot/
>
> 36b.
> "None of these cases will allow the code to reach here." <-- redundant sentence
>

Fixed all above.

This patch-set also fixes the crash reported by Kuroda-san, thanks to
Shveta for that fix.

regards,
Ajin Cherian
Fujitsu Australia

PFA v22 patch-set. It has below changes:

patch 001:
1) Now physical walsender wakes up logical walsender(s) by using a new
CV as suggested in [1]
2) Now pg_logical_slot_get_changes (and other such get/peek functions)
as well wait for standby(s) confirmation.

patch 002:
1) New column (synced_slot) added in pg_replication_slots to indicate
if it is a synced slot or user one.
2) Any attempt to do pg_drop_replication_slot() on synced-slot will
result in an error
3) Some portion of Peter's comments dated Oct4 and Kuroda-san's
comments dated Oct 2.

Thanks Hou-san for working on changes of patch 001.

[1]: https://www.postgresql.org/message-id/a539e247-30c8-4d5c-b561-07d0949cc960%40gmail.com

thanks
Shveta

On Mon, Oct 9, 2023 at 10:51 AM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On 10/6/23 6:48 PM, Amit Kapila wrote:
> > On Wed, Oct 4, 2023 at 5:34 PM Drouvot, Bertrand
> > <bertranddrouvot.pg@gmail.com> wrote:
> >>
> >> On 10/4/23 1:50 PM, shveta malik wrote:
> >>> On Wed, Oct 4, 2023 at 5:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>>>
> >>>> On Wed, Oct 4, 2023 at 11:55 AM Drouvot, Bertrand
> >>>> <bertranddrouvot.pg@gmail.com> wrote:
> >>>>>
> >>>>> On 10/4/23 6:26 AM, shveta malik wrote:
> >>>>>> On Wed, Oct 4, 2023 at 5:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> How about an alternate scheme where we define sync_slot_names on
> >>>>>>> standby but then store the physical_slot_name in the corresponding
> >>>>>>> logical slot (ReplicationSlotPersistentData) to be synced? So, the
> >>>>>>> standby will send the list of 'sync_slot_names' and the primary will
> >>>>>>> add the physical standby's slot_name in each of the corresponding
> >>>>>>> sync_slot. Now, if we do this then even after restart, we should be
> >>>>>>> able to know for which physical slot each logical slot needs to wait.
> >>>>>>> We can even provide an SQL API to reset the value of
> >>>>>>> standby_slot_names in logical slots as a way to unblock decoding in
> >>>>>>> case of emergency (for example, corresponding when physical standby
> >>>>>>> never comes up).
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Looks like a better approach to me. It solves most of the pain points like:
> >>>>>> 1) Avoids the need of multiple GUCs
> >>>>>> 2) Primary and standby need not to worry to be in sync if we maintain
> >>>>>> sync-slot-names GUC on both
> >>>>
> >>>> As per my understanding of this approach, we don't want
> >>>> 'sync-slot-names' to be set on the primary. Do you have a different
> >>>> understanding?
> >>>>
> >>>
> >>> Same understanding. We do not need it to be set on primary by user. It
> >>> will be GUC on standby and standby will convey it to primary.
> >>
> >> +1, same understanding here.
> >>
> >
> > At PGConf NYC, I had a brief discussion on this topic with Andres
> > where yet another approach to achieve this came up.
>
> Great!
>
> > Have a parameter
> > like enable_failover at the slot level (this will be persistent
> > information). Users can set it during the create/alter subscription or
> > via pg_create_logical_replication_slot(). Also, on physical standby,
> > there will be a parameter like enable_syncslot. All the physical
> > standbys that have set enable_syncslot will receive all the logical
> > slots that are marked as enable_failover. To me, whether to sync a
> > particular slot is a slot-level property, so defining it in this new
> > way seems reasonable.
>
> Yeah, as this is a slot-level property, I agree that this seems reasonable.
>
> Also that sounds more natural to me with this approach. The primary
> is really the one that "drives" which slots can be synced. I like it.
>
> One could also set enable_failover while creating a logical slot on a physical
> standby (so that cascading standbys could also have "extra slot" to sync as
> compare to "level 1" standbys).
>
> >
> > I think this will simplify the scheme a bit but still, the list of
> > physical standby's for which logical slots wait during decoding needs
> > to be maintained as we thought.
>
> Right.
>
> > But, how about with the above two
> > parameters (enable_failover and enable_syncslot), we have
> > standby_slot_names defined on the primary. That avoids the need to
> > store the list of standby_slot_names in logical slots and simplifies
> > the implementation quite a bit, right?
>
> Agree.
>
> > Now, one can think if we have a
> > parameter like 'standby_slot_names' then why do we need
> > enable_syncslot on physical standby but that will be required to
> > invoke sync worker which will pull logical slot's information?
>
> yes and enable_sync slot on the standby could also be used to "pause"
> the sync on standbys (by disabling the parameter) if one would want to
> (without the need to modify anything on the primary).
>
> > The
> > advantage of having standby_slot_names defined on primary is that we
> > can selectively wait on the subset of physical standbys where we are
> > syncing the slots.
>
> Yeah and this flexibility/filtering looks somehow mandatory to me.
>
> > I think this will be something similar to
> > 'synchronous_standby_names' in the sense that the physical standbys
> > mentioned in standby_slot_names will behave as synchronous copies with
> > respect to slots and after failover user can switch to one of these
> > physical standby and others can start following new master/publisher.
> >
> > Thoughts?
>
> I like the idea and I think that's the one that seems the more reasonable
> to me. I'd vote for this idea with:
>
> - standby_slot_names on the primary (could also be set on standbys in case of
> cascading context)
> - enable_failover at logical slot creation + API to enable/disable it at wish
> - enable_syncslot on the standbys
>

Thank You Amit and Bertrand for feedback on the new design.

PFA v23 patch set which attempts to implement the new proposed design
to handle sync candidates:
   a) The synchronize_slot_names GUC is removed.  Instead the
'enable_failover' property is added at the slot level which is
persistent. It can be set by the user using create-subscription
command. eg:   create subscription mysub connection '....' publication
mypub WITH (enable_failover = true);
   b) New GUC enable_syncslot is added on standbys to enable disable
slot-sync on standbys
   c) standby_slot_names are maintained on primary.

The patch 002 also addresses Peter's comments dated Oct 6 and Oct10.

Thank You Ajin for implementing 'create subscription' cmd changes to
support 'enable_failover' syntax.

This patch has not implemented below yet, it will be done in next version:
--Provide support to set/alter enable_failover using
alter-subscription and pg_create_logical_replication_slot
--Changes needed to support slot-synchronization on cascading standbys
--Display "enable_failover" property in pg_replication_slots. I think
it makes sense to do this.

thanks
Shveta

On Thu, Oct 12, 2023 at 9:18 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Oct 9, 2023 at 10:51 AM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On 10/6/23 6:48 PM, Amit Kapila wrote:
> > > On Wed, Oct 4, 2023 at 5:34 PM Drouvot, Bertrand
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > >>
> > >> On 10/4/23 1:50 PM, shveta malik wrote:
> > >>> On Wed, Oct 4, 2023 at 5:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >>>>
> > >>>> On Wed, Oct 4, 2023 at 11:55 AM Drouvot, Bertrand
> > >>>> <bertranddrouvot.pg@gmail.com> wrote:
> > >>>>>
> > >>>>> On 10/4/23 6:26 AM, shveta malik wrote:
> > >>>>>> On Wed, Oct 4, 2023 at 5:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> How about an alternate scheme where we define sync_slot_names on
> > >>>>>>> standby but then store the physical_slot_name in the corresponding
> > >>>>>>> logical slot (ReplicationSlotPersistentData) to be synced? So, the
> > >>>>>>> standby will send the list of 'sync_slot_names' and the primary will
> > >>>>>>> add the physical standby's slot_name in each of the corresponding
> > >>>>>>> sync_slot. Now, if we do this then even after restart, we should be
> > >>>>>>> able to know for which physical slot each logical slot needs to wait.
> > >>>>>>> We can even provide an SQL API to reset the value of
> > >>>>>>> standby_slot_names in logical slots as a way to unblock decoding in
> > >>>>>>> case of emergency (for example, corresponding when physical standby
> > >>>>>>> never comes up).
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Looks like a better approach to me. It solves most of the pain points like:
> > >>>>>> 1) Avoids the need of multiple GUCs
> > >>>>>> 2) Primary and standby need not to worry to be in sync if we maintain
> > >>>>>> sync-slot-names GUC on both
> > >>>>
> > >>>> As per my understanding of this approach, we don't want
> > >>>> 'sync-slot-names' to be set on the primary. Do you have a different
> > >>>> understanding?
> > >>>>
> > >>>
> > >>> Same understanding. We do not need it to be set on primary by user. It
> > >>> will be GUC on standby and standby will convey it to primary.
> > >>
> > >> +1, same understanding here.
> > >>
> > >
> > > At PGConf NYC, I had a brief discussion on this topic with Andres
> > > where yet another approach to achieve this came up.
> >
> > Great!
> >
> > > Have a parameter
> > > like enable_failover at the slot level (this will be persistent
> > > information). Users can set it during the create/alter subscription or
> > > via pg_create_logical_replication_slot(). Also, on physical standby,
> > > there will be a parameter like enable_syncslot. All the physical
> > > standbys that have set enable_syncslot will receive all the logical
> > > slots that are marked as enable_failover. To me, whether to sync a
> > > particular slot is a slot-level property, so defining it in this new
> > > way seems reasonable.
> >
> > Yeah, as this is a slot-level property, I agree that this seems reasonable.
> >
> > Also that sounds more natural to me with this approach. The primary
> > is really the one that "drives" which slots can be synced. I like it.
> >
> > One could also set enable_failover while creating a logical slot on a physical
> > standby (so that cascading standbys could also have "extra slot" to sync as
> > compare to "level 1" standbys).
> >
> > >
> > > I think this will simplify the scheme a bit but still, the list of
> > > physical standby's for which logical slots wait during decoding needs
> > > to be maintained as we thought.
> >
> > Right.
> >
> > > But, how about with the above two
> > > parameters (enable_failover and enable_syncslot), we have
> > > standby_slot_names defined on the primary. That avoids the need to
> > > store the list of standby_slot_names in logical slots and simplifies
> > > the implementation quite a bit, right?
> >
> > Agree.
> >
> > > Now, one can think if we have a
> > > parameter like 'standby_slot_names' then why do we need
> > > enable_syncslot on physical standby but that will be required to
> > > invoke sync worker which will pull logical slot's information?
> >
> > yes and enable_sync slot on the standby could also be used to "pause"
> > the sync on standbys (by disabling the parameter) if one would want to
> > (without the need to modify anything on the primary).
> >
> > > The
> > > advantage of having standby_slot_names defined on primary is that we
> > > can selectively wait on the subset of physical standbys where we are
> > > syncing the slots.
> >
> > Yeah and this flexibility/filtering looks somehow mandatory to me.
> >
> > > I think this will be something similar to
> > > 'synchronous_standby_names' in the sense that the physical standbys
> > > mentioned in standby_slot_names will behave as synchronous copies with
> > > respect to slots and after failover user can switch to one of these
> > > physical standby and others can start following new master/publisher.
> > >
> > > Thoughts?
> >
> > I like the idea and I think that's the one that seems the more reasonable
> > to me. I'd vote for this idea with:
> >
> > - standby_slot_names on the primary (could also be set on standbys in case of
> > cascading context)
> > - enable_failover at logical slot creation + API to enable/disable it at wish
> > - enable_syncslot on the standbys
> >
>
> Thank You Amit and Bertrand for feedback on the new design.
>
> PFA v23 patch set which attempts to implement the new proposed design
> to handle sync candidates:
>    a) The synchronize_slot_names GUC is removed.  Instead the
> 'enable_failover' property is added at the slot level which is
> persistent. It can be set by the user using create-subscription
> command. eg:   create subscription mysub connection '....' publication
> mypub WITH (enable_failover = true);
>    b) New GUC enable_syncslot is added on standbys to enable disable
> slot-sync on standbys
>    c) standby_slot_names are maintained on primary.
>
> The patch 002 also addresses Peter's comments dated Oct 6 and Oct10.
>
> Thank You Ajin for implementing 'create subscription' cmd changes to
> support 'enable_failover' syntax.
>
> This patch has not implemented below yet, it will be done in next version:
> --Provide support to set/alter enable_failover using
> alter-subscription and pg_create_logical_replication_slot
> --Changes needed to support slot-synchronization on cascading standbys
> --Display "enable_failover" property in pg_replication_slots. I think
> it makes sense to do this.
>
> thanks
> Shveta

PFA v24 patch set which has below changes:

1) 'enable_failover' displayed in pg_replication_slots.
2) Support for 'enable_failover' in
pg_create_logical_replication_slot(). It is an optional argument with
default value false.
3) Addressed pending comments (1-30) from Peter in [1].
4) Fixed an issue in patch002 due to which even slots with
enable_failover=false were getting synced.

The changes for 1 and 2 are in patch001 while 3 and 4 are in patch0002

Thanks Ajin, for working on 1 and 3.

[1]: https://www.postgresql.org/message-id/CAHut%2BPtbb3Ydx40a0p7Qovvp-4cC4ZCDreGRjmFzou8mjh2PmA%40mail.gmail.com

Next to do:
--Support for enable_failover in alter-subscription.
--Support for slot-sync on cascading standbys.


thanks
Shveta

On Tue, Oct 17, 2023 at 12:44 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> FYI - the latest patch failed to apply.
>
> [postgres@CentOS7-x64 oss_postgres_misc]$ git apply
> ../patches_misc/v24-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch
> error: patch failed: src/include/utils/guc_hooks.h:160
> error: src/include/utils/guc_hooks.h: patch does not apply

Rebased v24. PFA.

thanks
Shveta

On Wed, Oct 18, 2023 at 4:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 17, 2023 at 2:01 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Tue, Oct 17, 2023 at 12:44 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > FYI - the latest patch failed to apply.
> > >
> > > [postgres@CentOS7-x64 oss_postgres_misc]$ git apply
> > > ../patches_misc/v24-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch
> > > error: patch failed: src/include/utils/guc_hooks.h:160
> > > error: src/include/utils/guc_hooks.h: patch does not apply
> >
> > Rebased v24. PFA.
> >
>
> Few comments:
> ==============
> 1.
> +        List of physical replication slots that logical replication
> with failover
> +        enabled waits for.
>
> /logical replication/logical replication slots
>
> 2.
>  If
> +        <varname>enable_syncslot</varname> is not enabled on the
> +        corresponding standbys, then it may result in indefinite waiting
> +        on the primary for physical replication slots configured in
> +        <varname>standby_slot_names</varname>
> +       </para>
>
> Why the above leads to indefinite wait? I think we should just ignore
> standby_slot_names and probably LOG a message in the server for the
> same.
>
> 3.
> +++ b/src/backend/replication/logical/tablesync.c
> @@ -1412,7 +1412,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
>   */
>   walrcv_create_slot(LogRepWorkerWalRcvConn,
>      slotname, false /* permanent */ , false /* two_phase */ ,
> -    CRS_USE_SNAPSHOT, origin_startpos);
> +    false /* enable_failover */ , CRS_USE_SNAPSHOT,
> +    origin_startpos);
>
> As per this code, we won't enable failover for tablesync slots. So,
> what happens if we need to failover to new node after the tablesync
> worker has reached SUBREL_STATE_FINISHEDCOPY or SUBREL_STATE_DATASYNC?
> I think we won't be able to continue replication from failed over
> node. If this theory is correct, we have two options (a) enable
> failover for sync slots as well, if it is enabled for main slot; but
> then after we drop the slot on primary once sync is complete, same
> needs to be taken care at standby. (b) enable failover even for the
> main slot after all tables are in ready state, something similar to
> what we do for two_phase.
>
> 4.
> + /* Verify syntax */
> + if (!validate_slot_names(newval, &elemlist))
> + return false;
> +
> + /* Now verify if these really exist and have correct type */
> + if (!validate_standby_slots(elemlist))
>
> These two functions serve quite similar functionality which makes
> their naming quite confusing. Can we directly move the functionality
> of validate_slot_names() into validate_standby_slots()?
>
> 5.
> +SlotSyncInitConfig(void)
> +{
> + char    *rawname;
> +
> + /* Free the old one */
> + list_free(standby_slot_names_list);
> + standby_slot_names_list = NIL;
> +
> + if (strcmp(standby_slot_names, "") != 0)
> + {
> + rawname = pstrdup(standby_slot_names);
> + SplitIdentifierString(rawname, ',', &standby_slot_names_list);
>
> How does this handle the case where '*' is specified for standby_slot_names?
>
>
> --
> With Regards,
> Amit Kapila.


PFA v25 patch set. The changes are:

1) 'enable_failover' is changed to 'failover'
2) Alter subscription changes to support 'failover'
3) Fixes a bug in patch001 wherein any change in standby_slot_names
was not considered in the flow where logical walsenders wait for
standby's confirmation. Now during the wait, if standby_slot_names is
changed, wait is restarted using new standby_slot_names.
4) Addresses comments by Bertrand and Amit in [1],[2],[3]

The changes are mostly in patch001 and a very few in patch002.

Thank You Ajin for working on alter-subscription changes and adding
more TAP-tests for 'failover'

[1]: https://www.postgresql.org/message-id/2742485f-4118-4fb4-9f94-8150de9e7d7e%40gmail.com
[2]: https://www.postgresql.org/message-id/CAA4eK1JcBG6TJ3o5iUd4z0BuTbciLV3dK4aKgb7OgrNGoLcfSQ%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAA4eK1J6BqO5%3DueFAQO%2BaYyHLaU-oCHrrVFJqHS-i0Ce9aPY2w%40mail.gmail.com


thanks
Shveta

On Fri, Oct 27, 2023 at 3:26 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Oct 25, 2023 at 3:15 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On 10/25/23 5:00 AM, shveta malik wrote:
> > > On Tue, Oct 24, 2023 at 11:54 AM Drouvot, Bertrand
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> On 10/23/23 2:56 PM, shveta malik wrote:
> > >>> On Mon, Oct 23, 2023 at 5:52 PM Drouvot, Bertrand
> > >>> <bertranddrouvot.pg@gmail.com> wrote:
> > >>
> > >>>> We are waiting for DEFAULT_NAPTIME_PER_CYCLE (3 minutes) before checking if there
> > >>>> is new synced slot(s) to be created on the standby. Do we want to keep this behavior
> > >>>> for V1?
> > >>>>
> > >>>
> > >>> I think for the slotsync workers case, we should reduce the naptime in
> > >>> the launcher to say 30sec and retain the default one of 3mins for
> > >>> subscription apply workers. Thoughts?
> > >>>
> > >>
> > >> Another option could be to keep DEFAULT_NAPTIME_PER_CYCLE and create a new
> > >> API on the standby that would refresh the list of sync slot at wish, thoughts?
> > >>
> > >
> > > Do you mean API to refresh list of DBIDs rather than sync-slots?
> > > As per current design, launcher gets DBID lists for all the failover
> > > slots from the primary at intervals of DEFAULT_NAPTIME_PER_CYCLE.
> >
> > I mean an API to get a newly created slot on the primary being created/synced on
> > the standby at wish.
> >
> > Also let's imagine this scenario:
> >
> > - create logical_slot1 on the primary (and don't start using it)
> >
> > Then on the standby we'll get things like:
> >
> > 2023-10-25 08:33:36.897 UTC [740298] LOG:  waiting for remote slot "logical_slot1" LSN (0/C00316A0) and catalog
xmin(752) to pass local slot LSN (0/C0049530) and and catalog xmin (754) 
> >
> > That's expected and due to the fact that ReplicationSlotReserveWal() does set the slot
> > restart_lsn to a value < at the corresponding restart_lsn slot on the primary.
> >
> > - create logical_slot2 on the primary (and start using it)
> >
> > Then logical_slot2 won't be created/synced on the standby until there is activity on logical_slot1 on the primary
> > that would produce things like:
> > 2023-10-25 08:41:35.508 UTC [740298] LOG:  wait over for remote slot "logical_slot1" as its LSN (0/C005FFD8) and
catalogxmin (756) has now passed local slot LSN (0/C0049530) and catalog xmin (754) 
>
>
> Slight correction to above. As soon as we start activity on
> logical_slot2, it will impact all the slots on primary, as the WALs
> are consumed by all the slots. So even if there is activity on
> logical_slot2, logical_slot1 creation on standby will be unblocked and
> it will then move to logical_slot2 creation. eg:
>
> --on standby:
> 2023-10-27 15:15:46.069 IST [696884] LOG:  waiting for remote slot
> "mysubnew1_1" LSN (0/3C97970) and catalog xmin (756) to pass local
> slot LSN (0/3C979A8) and and catalog xmin (756)
>
> on primary:
> newdb1=# select now();
>                now
> ----------------------------------
>  2023-10-27 15:15:51.504835+05:30
> (1 row)
>
> --activity on mysubnew1_3
> newdb1=# insert into tab1_3 values(1);
> INSERT 0 1
> newdb1=# select now();
>                now
> ----------------------------------
>  2023-10-27 15:15:54.651406+05:30
>
>
> --on standby, mysubnew1_1 is unblocked.
> 2023-10-27 15:15:56.223 IST [696884] LOG:  wait over for remote slot
> "mysubnew1_1" as its LSN (0/3C97A18) and catalog xmin (757) has now
> passed local slot LSN (0/3C979A8) and catalog xmin (756)
>
> My Setup:
> mysubnew1_1 -->mypubnew1_1 -->tab1_1
> mysubnew1_3 -->mypubnew1_3-->tab1_3
>
> thanks
> Shveta

PFA v26 patches. The changes are:

1) 'Failover' in the main slot is now set when the table
synchronization phase is finished. So even when failover is enabled
for a subscription, the internal failover state remains temporarily
“pending” until the initialization phase completes.

2) If the standby is down, but standby_slot_names has that slot name,
we emit a warning now while waiting for that standby.

3) Fixed bug where pg_logical_slot_get_changes was resetting failover
property of slot. Thanks Ajin for providing the fix.

4) Fixed bug where standby_slot_names_list was not initialized for
non-walsender cases making pg_logical_slot_get_changes() to proceed
w/o waiting for standbys.

5) Fixed a bug where standby_slot_names_list was freed (due to free of
per_query context in non-walsender cases) but was not nullified and
thus next call was using this freed pointer and was crashing.

6) Improved wait_for_primary_slot_catchup(), we now fetch
remote-conflicting(invalidation) too and abort the wait and slot
creation if the slot on primary is invalidated.

7) Slot-sync workers now wait for cascading standby's confirmation
before updating logical synced slots on first standby.

First 5 changes are in patch001, 6th one is in patch002. For 7th, I
have created a new patch (003) to separate out the additional changes
needed for cascading standbys.

==========

Open questions regarding change for pt 1 above:
a) I think we should restrict the 'alter-sub set failover' when
failover-state is currently in 'p' (pending) state i.e. table-sync is
going over. Once table-sync is over, then toggle of 'failover' should
be allowed using alter-subscription.

b) Currently I have restricted  'alter subscription.. refresh
publication with copy=true' when failover=true (on a similar line of
two-phase). The reason being, refresh with copy=true will go for
table-sync again and since failover was set in main-slot after
table-sync was done, it will need going through the same transition of
'p' to 'e' for main slot making it unsyncable for that time. Should it
be allowed?
Currently:
newdb1=# ALTER SUBSCRIPTION mysubnew1_1 REFRESH PUBLICATION WITH
(copy_data=true);
ERROR:  ALTER SUBSCRIPTION ... REFRESH with copy_data is not allowed
when failover is enabled
HINT:  Use ALTER SUBSCRIPTION ... REFRESH with copy_data = false, or
use DROP/CREATE SUBSCRIPTION.

Thoughts on above queries?

thanks
Shveta

On Tue, Oct 31, 2023 at 11:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 31, 2023 at 7:16 AM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > > > 2.
> > > > Currently ctx->failover is set only in the pgoutput_startup(), but not sure it is
> > > OK.
> > > > Can we change the parameter in CreateDecodingContext() or similar functions?
> > > >
> > > > Because IIUC it means that only slots which have pgoutput can wait. Other
> > > > output plugins must understand the change and set faliover flag as well -
> > > > I felt it is not good. E.g., you might miss to enable the parameter in
> > > test_decoding.
> > > >
> > > > Regarding the two_phase parameter, setting on plugin layer is good because it
> > > > quite affects the output. As for the failover, it is not related with the
> > > > content so that all of slots should be enabled.
> > > >
> > > > I think CreateDecodingContext or StartupDecodingContext() is the common
> > > path.
> > > > Or, is it the out-of-scope for now?
> > >
> > > Currently, the failover field is part of the options list in the
> > > StartReplicationCmd. This gives some
> > > level of flexibility such that only plugins that are interested in
> > > this need to handle it. The options list
> > > is only deparsed by plugins.  If we move it to outside of the options list,
> > > this sort of changes the protocol for START_REPLICATION and will
> > > impact all plugins.
> > >  But I agree to your larger point that, we need to do it in such a way that
> > > other plugins do not unintentionally change the 'failover' behaviour
> > > of the originally created slot.
> > > Maybe I can code it in such a way that, only if the failover option is
> > > specified in the list of options
> > > passed as part of START_REPLICATION  will it change the original slot
> > > created 'failover' flag by adding
> > > another flag "failover_opt_given". Plugins that set this, will be able
> > > to change the failover flag of the slot,
> > > while plugins that do not support this will not set this and the
> > > failover flag of the created slot will remain.
> > > What do you think?
> >
> > May be OK, but I came up with a corner case that external plugins have a streaming
> > option 'failover'. What should be? Has the option been reserved?
> >
>
> Sorry, your question is not clear to me. Did you intend to say that
> the value of the existing streaming option could be 'failover'?
>
> --
> With Regards,
> Amit Kapila.


PFA v27 patch-set which has below changes:

1) Enhanced WalSndWait to replace ConditionVariableSleep on
WalSndCtl->wal_confirm_rcv_cv as per suggestion in [1].
2) WalSndWaitForWal and WalSndWaitForStandbyConfirmation is now
integrated as per suggestion in [2]. WalSndWait is invoked only once.
3) Optimized slot-creation algorithm on standby as per suggestion in
[3]. Now, during the first attempt of slots-creation we create all
active slots and add inactive ones to the pending list and then we
wait on them in the second attempt.
4) Added basic tests for failover slots.

Changes for 1 and 2 are in patch001 and for 3 and 4 are in patch002.

Thanks Hou-San for implementing changes for 1 and 2. Thanks Ajin for
implementing failover tests/4.


[1]: https://www.postgresql.org/message-id/f3228cfb-7bf3-4bd8-8f37-c55fc4054759%40gmail.com
[2]: https://www.postgresql.org/message-id/CAA4eK1J49j5ew-Tk4Ygv0nbjurJz12kZtqjHLALFuL03NBZdsg%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAA4eK1KBL0110gamQfc62X%3D5JV8-Qjd0dw0Mq0o07cq6kE%2Bq%3Dg%40mail.gmail.com


thanks
Shveta

On Tuesday, October 31, 2023 6:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Fri, Oct 27, 2023 at 4:04 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Fri, Oct 27, 2023 at 3:26 PM shveta malik <shveta.malik@gmail.com>
> wrote:
> > ==========
> >
> > Open questions regarding change for pt 1 above:
> > a) I think we should restrict the 'alter-sub set failover' when
> > failover-state is currently in 'p' (pending) state i.e. table-sync is
> > going over. Once table-sync is over, then toggle of 'failover' should
> > be allowed using alter-subscription.
> >
> 
> Agreed.
> 
> > b) Currently I have restricted  'alter subscription.. refresh
> > publication with copy=true' when failover=true (on a similar line of
> > two-phase). The reason being, refresh with copy=true will go for
> > table-sync again and since failover was set in main-slot after
> > table-sync was done, it will need going through the same transition of
> > 'p' to 'e' for main slot making it unsyncable for that time. Should it
> > be allowed?
> >
> 
> Yeah, I also think we can't allow refresh with copy=true when 'failover' is
> enabled.
> 
> I think the current implementation of this flag seems a bit clumsy because
> 'failover' is a slot property and we are trying to map it to plugin_options. It has
> to be considered similar to the opt_temporary option while creating the slot.
> 
> We have create_replication_slot and drop_replication_slot in repl_gram.y. How
> about if introduce alter_replication_slot and handle the 'failover' flag with that?
> The idea is we will either enable 'failover' at the time create_replication_slot by
> providing an optional failover option or execute a separate command
> alter_replication_slot. I think we probably need to perform this command
> before the start of streaming.

Here is an attempt to achieve the same. I added a new replication command
alter_replication_slot and introduced a walreceiver api walrcv_alter_slot to
execute the command. The subscription will call the api to enable/disable
the failover of the slot on publisher.

The patch disallows altering the failover option for the subscription. But we
could release the restriction by using the following approaches in next version:

> I think we will have the following options to allow alter of the 'failover'
> property: (a) we can allow altering 'failover' only for the 'disabled'
> subscription; to achieve that, we need to open a connection during alter
> subscription and change this property of slot; (b) apply worker detects the
> change in 'failover' option; run the alter_replication_slot command; this needs
> more analysis as apply_worker is already doing streaming and changing slot
> property in between could be tricky.

Best Regards,
Hou zj

Attachment

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

02 November 2023, 09:04:57

On Thursday, November 2, 2023 8:27 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> On Tuesday, October 31, 2023 6:45 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Fri, Oct 27, 2023 at 4:04 PM shveta malik <shveta.malik@gmail.com>
> wrote:
> > >
> > > On Fri, Oct 27, 2023 at 3:26 PM shveta malik
> > > <shveta.malik@gmail.com>
> > wrote:
> > > ==========
> > >
> > > Open questions regarding change for pt 1 above:
> > > a) I think we should restrict the 'alter-sub set failover' when
> > > failover-state is currently in 'p' (pending) state i.e. table-sync
> > > is going over. Once table-sync is over, then toggle of 'failover'
> > > should be allowed using alter-subscription.
> > >
> >
> > Agreed.
> >
> > > b) Currently I have restricted  'alter subscription.. refresh
> > > publication with copy=true' when failover=true (on a similar line of
> > > two-phase). The reason being, refresh with copy=true will go for
> > > table-sync again and since failover was set in main-slot after
> > > table-sync was done, it will need going through the same transition
> > > of 'p' to 'e' for main slot making it unsyncable for that time.
> > > Should it be allowed?
> > >
> >
> > Yeah, I also think we can't allow refresh with copy=true when
> > 'failover' is enabled.
> >
> > I think the current implementation of this flag seems a bit clumsy
> > because 'failover' is a slot property and we are trying to map it to
> > plugin_options. It has to be considered similar to the opt_temporary option
> while creating the slot.
> >
> > We have create_replication_slot and drop_replication_slot in
> > repl_gram.y. How about if introduce alter_replication_slot and handle the
> 'failover' flag with that?
> > The idea is we will either enable 'failover' at the time
> > create_replication_slot by providing an optional failover option or
> > execute a separate command alter_replication_slot. I think we probably
> > need to perform this command before the start of streaming.
> 
> Here is an attempt to achieve the same. I added a new replication command
> alter_replication_slot and introduced a walreceiver api walrcv_alter_slot to
> execute the command. The subscription will call the api to enable/disable the
> failover of the slot on publisher.

Here is the new version patch set(V29) which addressed Peter comments[1][2] and
fixed one doc compile error.

Thanks Ajin for helping address some of the comments.

[1] https://www.postgresql.org/message-id/CAHut%2BPspseC03Fhsi%3DOqOtksagspE%2B0MVOhrhhUb64cc_4SE1w%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPubYbmLpGeOd2QTBPhHwtZa-Qm9Kg38Cu_EiG%2B1RbV47g%40mail.gmail.com

Best Regards,
Hou zj

Attachment

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

03 November 2023, 11:31:43

On Thu, Nov 2, 2023 at 2:35 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Here is the new version patch set(V29) which addressed Peter comments[1][2] and
> fixed one doc compile error.
>

Few comments:
==============
1.
+       <varlistentry id="sql-createsubscription-params-with-failover">
+        <term><literal>failover</literal> (<type>boolean</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether the replication slot assocaited with the
subscription
+          is enabled to be synced to the physical standbys so that logical
+          replication can be resumed from the new primary after failover.
+          The default is <literal>true</literal>.

Why do you think it is a good idea to keep the default value as true?
I think the user needs to enable standby for syncing slots which is
not a default feature, so by default, the failover property should
also be false. AFAICS, it is false for create_slot SQL API as per the
below change; so that way also keeping default true for a subscription
doesn't make sense.
@@ -479,6 +479,7 @@ CREATE OR REPLACE FUNCTION
pg_create_logical_replication_slot(
     IN slot_name name, IN plugin name,
     IN temporary boolean DEFAULT false,
     IN twophase boolean DEFAULT false,
+    IN failover boolean DEFAULT false,
     OUT slot_name name, OUT lsn pg_lsn)

BTW, the below change indicates that the code treats default as false;
so, it seems to be a documentation error.
@@ -157,6 +158,8 @@ parse_subscription_options(ParseState *pstate,
List *stmt_options,
  opts->runasowner = false;
  if (IsSet(supported_opts, SUBOPT_ORIGIN))
  opts->origin = pstrdup(LOGICALREP_ORIGIN_ANY);
+ if (IsSet(supported_opts, SUBOPT_FAILOVER))
+ opts->failover = false;

2.
-
 /*
  * Common option parsing function for CREATE and ALTER SUBSCRIPTION commands.
  *

Spurious line removal.

3.
+ else if (opts.slot_name && failover_enabled)
+ {
+ walrcv_alter_slot(wrconn, opts.slot_name, opts.failover);
+ ereport(NOTICE,
+ (errmsg("altered replication slot \"%s\" on publisher",
+ opts.slot_name)));
+ }

I think we can add a comment to describe why it makes sense to enable
the failover property of the slot in this case. Can we change the
notice message to: "enabled failover for replication slot \"%s\" on
publisher"

4.
 libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
- bool temporary, bool two_phase, CRSSnapshotAction snapshot_action,
- XLogRecPtr *lsn)
+ bool temporary, bool two_phase, bool failover,
+ CRSSnapshotAction snapshot_action, XLogRecPtr *lsn)
 {
  PGresult   *res;
  StringInfoData cmd;
@@ -913,7 +917,14 @@ libpqrcv_create_slot(WalReceiverConn *conn, const
char *slotname,
  else
  appendStringInfoChar(&cmd, ' ');
  }
-
+ if (failover)
+ {
+ appendStringInfoString(&cmd, "FAILOVER");
+ if (use_new_options_syntax)
+ appendStringInfoString(&cmd, ", ");
+ else
+ appendStringInfoChar(&cmd, ' ');
+ }

I don't see a corresponding change in repl_gram.y. I think the
following part of the code needs to be changed:
/* CREATE_REPLICATION_SLOT slot [TEMPORARY] LOGICAL plugin [options] */
| K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT
create_slot_options

You also need to update the docs for the same. See [1].

5.
@@ -228,6 +230,28 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo
fcinfo, bool confirm, bool bin
  NameStr(MyReplicationSlot->data.plugin),
  format_procedure(fcinfo->flinfo->fn_oid))));
..
+ if (XLogRecPtrIsInvalid(upto_lsn))
+ wal_to_wait = end_of_wal;
+ else
+ wal_to_wait = Min(upto_lsn, end_of_wal);
+
+ /* Initialize standby_slot_names_list */
+ SlotSyncInitConfig();
+
+ /*
+ * Wait for specified streaming replication standby servers (if any)
+ * to confirm receipt of WAL upto wal_to_wait.
+ */
+ WalSndWaitForStandbyConfirmation(wal_to_wait);
+
+ /*
+ * The memory context used to allocate standby_slot_names_list will be
+ * freed at the end of this call. So free and nullify the list in
+ * order to avoid usage of freed list in the next call to this
+ * function.
+ */
+ SlotSyncFreeConfig();

What if there is an error in WalSndWaitForStandbyConfirmation() before
calling SlotSyncFreeConfig()? I think the problem you are trying to
avoid by freeing it here can occur. I think it is better to do this in
a logical decoding context and free the list along with it as we are
doing in commit c7256e6564(see PG15). Also, it is better to allocate
this list somewhere in WalSndWaitForStandbyConfirmation(), probably in
WalSndGetStandbySlots, that will make the code look neat and also
avoid allocating this list when failover is not enabled for the slot.

6.
+/* ALTER_REPLICATION_SLOT slot */
+alter_replication_slot:
+ K_ALTER_REPLICATION_SLOT IDENT '(' generic_option_list ')'

I think you need to update the docs for this new command. See existing docs [1].

[1] - https://www.postgresql.org/docs/devel/protocol-replication.html

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

06 November 2023, 01:30:58

On Friday, November 3, 2023 7:32 PM Amit Kapila <amit.kapila16@gmail.com>
> 
> On Thu, Nov 2, 2023 at 2:35 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > Here is the new version patch set(V29) which addressed Peter
> > comments[1][2] and fixed one doc compile error.
> >
> 
> Few comments:
> ==============
> 1.
> +       <varlistentry id="sql-createsubscription-params-with-failover">
> +        <term><literal>failover</literal> (<type>boolean</type>)</term>
> +        <listitem>
> +         <para>
> +          Specifies whether the replication slot assocaited with the
> subscription
> +          is enabled to be synced to the physical standbys so that logical
> +          replication can be resumed from the new primary after failover.
> +          The default is <literal>true</literal>.
> 
> Why do you think it is a good idea to keep the default value as true?
> I think the user needs to enable standby for syncing slots which is not a default
> feature, so by default, the failover property should also be false. AFAICS, it is
> false for create_slot SQL API as per the below change; so that way also keeping
> default true for a subscription doesn't make sense.
> @@ -479,6 +479,7 @@ CREATE OR REPLACE FUNCTION
> pg_create_logical_replication_slot(
>      IN slot_name name, IN plugin name,
>      IN temporary boolean DEFAULT false,
>      IN twophase boolean DEFAULT false,
> +    IN failover boolean DEFAULT false,
>      OUT slot_name name, OUT lsn pg_lsn)
> 
> BTW, the below change indicates that the code treats default as false; so, it
> seems to be a documentation error.

I think the document is wrong and fixed it.

> 
> 2.
> -
>  /*
>   * Common option parsing function for CREATE and ALTER SUBSCRIPTION
> commands.
>   *
> 
> Spurious line removal.
> 
> 3.
> + else if (opts.slot_name && failover_enabled) {
> + walrcv_alter_slot(wrconn, opts.slot_name, opts.failover);
> + ereport(NOTICE, (errmsg("altered replication slot \"%s\" on
> + publisher", opts.slot_name))); }
> 
> I think we can add a comment to describe why it makes sense to enable the
> failover property of the slot in this case. Can we change the notice message to:
> "enabled failover for replication slot \"%s\" on publisher"

Added.

> 
> 4.
>  libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
> - bool temporary, bool two_phase, CRSSnapshotAction snapshot_action,
> - XLogRecPtr *lsn)
> + bool temporary, bool two_phase, bool failover, CRSSnapshotAction
> + snapshot_action, XLogRecPtr *lsn)
>  {
>   PGresult   *res;
>   StringInfoData cmd;
> @@ -913,7 +917,14 @@ libpqrcv_create_slot(WalReceiverConn *conn, const
> char *slotname,
>   else
>   appendStringInfoChar(&cmd, ' ');
>   }
> -
> + if (failover)
> + {
> + appendStringInfoString(&cmd, "FAILOVER"); if (use_new_options_syntax)
> + appendStringInfoString(&cmd, ", "); else appendStringInfoChar(&cmd, '
> + '); }
> 
> I don't see a corresponding change in repl_gram.y. I think the following part of
> the code needs to be changed:
> /* CREATE_REPLICATION_SLOT slot [TEMPORARY] LOGICAL plugin [options] */
> | K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT
> create_slot_options
> 

I think after 0266e98, we started to use the new syntax(see the
generic_option_list rule) and we can avoid changing the repl_gram.y when adding
new options. The new failover can be detected when parsing the generic option
list(in parseCreateReplSlotOptions).


> 
> 5.
> @@ -228,6 +230,28 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo
> fcinfo, bool confirm, bool bin
>   NameStr(MyReplicationSlot->data.plugin),
>   format_procedure(fcinfo->flinfo->fn_oid))));
> ..
> + if (XLogRecPtrIsInvalid(upto_lsn))
> + wal_to_wait = end_of_wal;
> + else
> + wal_to_wait = Min(upto_lsn, end_of_wal);
> +
> + /* Initialize standby_slot_names_list */ SlotSyncInitConfig();
> +
> + /*
> + * Wait for specified streaming replication standby servers (if any)
> + * to confirm receipt of WAL upto wal_to_wait.
> + */
> + WalSndWaitForStandbyConfirmation(wal_to_wait);
> +
> + /*
> + * The memory context used to allocate standby_slot_names_list will be
> + * freed at the end of this call. So free and nullify the list in
> + * order to avoid usage of freed list in the next call to this
> + * function.
> + */
> + SlotSyncFreeConfig();
> 
> What if there is an error in WalSndWaitForStandbyConfirmation() before calling
> SlotSyncFreeConfig()? I think the problem you are trying to avoid by freeing it
> here can occur. I think it is better to do this in a logical decoding context and
> free the list along with it as we are doing in commit c7256e6564(see PG15).

I will analyze more about this case and update in next version.

> Also,
> it is better to allocate this list somewhere in
> WalSndWaitForStandbyConfirmation(), probably in WalSndGetStandbySlots,
> that will make the code look neat and also avoid allocating this list when
> failover is not enabled for the slot.

Changed as suggested.


> 
> 6.
> +/* ALTER_REPLICATION_SLOT slot */
> +alter_replication_slot:
> + K_ALTER_REPLICATION_SLOT IDENT '(' generic_option_list ')'
> 
> I think you need to update the docs for this new command. See existing docs
> [1].
> 
> [1] - https://www.postgresql.org/docs/devel/protocol-replication.html

I think the doc for alter_replication_slot was added in V29.

Attach the V30 patch set which addressed above comments and fixed CFbot failures.

Best Regards,
Hou zj

On Mon, Nov 6, 2023 at 5:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 6, 2023 at 1:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Nov 6, 2023 at 7:01 AM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> > >
>
> +static void
> +WalSndGetStandbySlots(List **standby_slots, bool force)
> +{
> + if (!MyReplicationSlot->data.failover)
> + return;
> +
> + if (standby_slot_names_list == NIL && strcmp(standby_slot_names, "") != 0)
> + SlotSyncInitConfig();
> +
> + if (force || StandbySlotNamesPreReload == NULL ||
> + strcmp(StandbySlotNamesPreReload, standby_slot_names) != 0)
> + {
> + list_free(*standby_slots);
> +
> + if (StandbySlotNamesPreReload)
> + pfree(StandbySlotNamesPreReload);
> +
> + StandbySlotNamesPreReload = pstrdup(standby_slot_names);
> + *standby_slots = list_copy(standby_slot_names_list);
> + }
> +}
>
> I find this code bit difficult to understand. I think we don't need to
> maintain a global variable like StandbySlotNamesPreReload. We can use
> a local variable for it on the lines of what we do in
> StartupRereadConfig(). Then, can we think of maintaining
> standby_slot_names_list in something related to decoding like
> LogicalDecodingContext as this will be used during decoding only?
>

Yes, agreed. This code part is now simplified in v31. PFA the patches.

The overall changes are:

1) Caching of the standby_slots list in the logical-decoding context
as suggested above. All the globals have been removed.
2) Dropping of local synced slots for obsolete dbs. Launcher now takes
care of that.
3) There was a repeated warning in the log file due to missing GUCs as
described in [1]. Fixed that.
4) Optimized code in slotsync.c and launcher.c to get rid of globals.
5) Adjusted patch003's wait-for-standby logic in slot-sync workers as
per changes in pt. 1. There is still one optimization left here (in
patch003) to avoid repeated parsing. I have mentioned the TODO
comment. Will be targeted in the next version.

The changes for 1 are in patch01. The changes for 2,3,4 are in patch02.

Thanks Hou-san for implementing the changes for 1 and assisting in 5.

[1]: https://www.postgresql.org/message-id/CAJpy0uDpV0suPbhCp%2B1aRLXEChD9uKp-ffBW_HfZro%3D53JKK5w%40mail.gmail.com

thanks
Shveta

On Thu, Nov 9, 2023 at 8:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 8:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Nov 8, 2023 at 8:09 PM Drouvot, Bertrand
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > > Unrelated to above, if there is a user slot on standby with the same
> > > > name which the slot-sync worker is trying to create, then shall it
> > > > emit a warning and skip the sync of that slot or shall it throw an
> > > > error?
> > > >
> > >
> > > I'd vote for emit a warning and move on to the next slot if any.
> > >
> >
> > But then it could take time for users to know the actual problem and
> > they probably notice it after failover. OTOH, if we throw an error
> > then probably they will come to know earlier because the slot sync
> > mechanism would be stopped. Do you have reasons to prefer giving a
> > WARNING and skipping creating such slots? I expect this WARNING to
> > keep getting repeated in LOGs because the consecutive sync tries will
> > again generate a WARNING.
> >
>
> Apart from the above, I would like to discuss the slot sync work
> distribution strategy of this patch. The current implementation as
> explained in the commit message [1] works well if the slots belong to
> multiple databases. It is clear from the data in emails [2][3][4] that
> having more workers really helps if the slots belong to multiple
> databases. But I think if all the slots belong to one or very few
> databases then such a strategy won't be as good. Now, on one hand, we
> get very good numbers for a particular workload with the strategy used
> in the patch but OTOH it may not be adaptable to various different
> kinds of workloads. So, I have a question whether we should try to
> optimize this strategy for various kinds of workloads or for the first
> version let's use a single-slot sync-worker and then we can enhance
> the functionality in later patches either in PG17 itself or in PG18 or
> later versions. One thing to note is that a lot of the complexity of
> the patch is attributed to the multi-worker strategy which may still
> not be efficient, so there is an argument to go with a simpler
> single-slot sync-worker strategy and then enhance it in future
> versions as we learn more about various workloads. It will also help
> to develop this feature incrementally instead of doing all the things
> in one go and taking a much longer time than it should.
>
> Thoughts?
>
> [1] - "The replication launcher on the physical standby queries
> primary to get the list of dbids for failover logical slots. Once it
> gets the dbids, if dbids < max_slotsync_workers, it starts only that
> many workers, and if dbids > max_slotsync_workers, it starts
> max_slotsync_workers and divides the work equally among them. Each
> worker is then responsible to keep on syncing the logical slots
> belonging to the DBs assigned to it.
>
> Each slot-sync worker will have its own dbids list. Since the upper
> limit of this dbid-count is not known, it needs to be handled using
> dsa. We initially allocated memory to hold 100 dbids for each worker.
> If this limit is exhausted, we reallocate this memory with size
> incremented again by 100."
>
> [2] - https://www.postgresql.org/message-id/CAJpy0uD2F43avuXy_yQv7Wa3kpUwioY_Xn955xdmd6vX0ME6%3Dg%40mail.gmail.com
> [3] - https://www.postgresql.org/message-id/CAFPTHDZw2G3Pax0smymMjfPqdPcZhMWo36f9F%2BTwNTs0HFxK%2Bw%40mail.gmail.com
> [4] - https://www.postgresql.org/message-id/CAJpy0uD%3DDevMxTwFVsk_%3DxHqYNH8heptwgW6AimQ9fbRmx4ioQ%40mail.gmail.com
>
> --
> With Regards,
> Amit Kapila.

PFA v32 patches which has below changes:

1) Changed how standby_slot_names is handled. On reanalyzing, logical
decoding context might not be the best place to cache the standby slot
list, because not all the callers(1. user backend. 2. walsender 3.
slotsync worker) can access the logical decoding ctx. To make the
access of the list consistent, cache the list in a global variable
instead. Also, to avoid the trouble of allocating and freeing the list
at various places, we [re]initialize the list in the GUC assign hook,
it would be easier for caller to use the list.

2) Changed 'bool synced' in ReplicationSlotPersistentData to 'char
sync_state'. Values are:
'n': none for user slots,
'i': sync initiated for the slot but waiting for primary to catch up.
'r': ready for periodic syncs.

3) Improved slot-creation logic in slot sync worker. Now any active
slot's sync is not blocked by inactive slot's creation. The worker
attempts pinging the primary server a fixed number of times and waits
for it to catch-up with local-slot's lsn, after that it moves to the
next slot. The worker reattempts the wait for pending ones in the next
sync-cycle. Meanwhile any such slot (waiting for primary to catch-up)
is not dropped but sync_status is marked as 'i'. Once the worker
finishes initialization for such a slot (in any of the sync-cycles),
sync_state of slot is changed to 'r'.

4) The slots with state 'i' are dropped by the slot-sync worker when
it finds out that it is no longer in standby mode and then it exits.

5) Cascading standby does not sync slots with 'sync_state' = 'i' from
the first standby.

6) Changed the naptime computation logic. Now during each sync-cycle,
if any of the received slots is updated, we retain default-naptime
else we increase the naptime provided inactivity time reaches
threshold.

7) Added warning for cases where a user-slot with the same name is
already present which slot-sync worker is trying to create. Sync for
such slots is skipped.

Changes for 1 are in patch001 and patch003. Changes for 2-7 are in patch002.

Thank You Hou-san for working on 1.

Open Question:
1) Currently I have put drop slot logic for slots with 'sync_state=i'
in slot-sync worker. Do we need to put it somewhere in promotion-logic
as well?  Perhaps in WaitForWALToBecomeAvailable() where we call
XLogShutdownWalRcv after checking 'CheckForStandbyTrigger'. Thoughts?

thanks
Shveta

On Friday, November 10, 2023 4:16 PM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote:
> 
> 
> >>> Also, the current coding doesn't ensure we will always give WARNING.
> >>> If we see the below code that deals with this WARNING,
> >>>
> >>> +  /* User created slot with the same name exists, emit WARNING. */
> >>> +  else if (found && s->data.sync_state == SYNCSLOT_STATE_NONE)
> >>> +  {
> >>> +    ereport(WARNING,
> >>> +        errmsg("not synchronizing slot %s; it is a user created slot",
> >>> +             remote_slot->name));
> >>> +  }
> >>> +  /* Otherwise create the slot first. */
> >>> +  else
> >>> +  {
> >>> +    TransactionId xmin_horizon = InvalidTransactionId;
> >>> +    ReplicationSlot *slot;
> >>> +
> >>> +    ReplicationSlotCreate(remote_slot->name, true, RS_EPHEMERAL,
> >>> +                remote_slot->two_phase, false);
> >>>
> >>> I think this is not a solid check to ensure that the slot existed
> >>> before. Because it could be created as soon as the slot sync worker
> >>> invokes ReplicationSlotCreate() here.
> >>
> >> Agree.
> >>
> >
> > So, having a concrete check to give WARNING would require some more
> > logic which I don't think is a good idea to handle this boundary case.
> >
> 
> Yeah good point, agree to just error out in all the case then (if we discard the
> sync_ reserved wording proposal, which seems to be the case as probably not
> worth the extra work).

Thanks for the discussion!

Here is the V33 patch set which includes the following changes:

1) Drop slots with state 'i' in promotion flow after we shut down WalReceiver.

2) Raise error if user slot with same name already exists on standby.

3) Hold spinlock when updating the porperty of the replication slot or when
   reading the slot's info without acuqiring it.

4) Fixed the bug:
- if slot is invalidated on standby but standby is restarted immediately after
  that, then it fails to recreate that slot and instead end up syncing it
  again. It is fixed by checking the invalidated flag after acquiring the slot
  and skip syncing for invalidated slots.

5) Fixed the bugs reported by Ajin[1][2].

6) Removed some unused variables.

Thanks Shveta for working on 1) 2) 4) and 5).
Thanks Ajin for testing 5).

---
TODO
There are few pending comments and bugs have not been addressed, I will work on
them in next version:

1) Comments posted by me[3]:
2) Shutdown the slotsync worker on promotion and probably let slotsync do necessary cleanups[4]
3) Consider documenting the hazard that create slot on standby may cause ERRORs
   in the log and consider making it visible in the view.
4) One bug found internally: If we give non-existing dbname in primary_conninfo
   on standby, it should be handled gracefully, launcher should skip
   slots-synchronization. 
5) One bug found internally: when wait_for_primary_slot_catchup is doing
   WaitLatch and I stop the standby using: ./pg_ctl -D ../../standbydb/ -l
   standby.log stop it does not come out of wait and shutdown hangs.

[1] https://www.postgresql.org/message-id/CAFPTHDZNV_HFAXULkaJOv_MMtLukCzDEgTaixxBwjEO_0Jg-kg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAFPTHDa5C_vHQbeqemToyucWySB0kEFbdS2WOA0PB%2BGSei2v7A%40mail.gmail.com
[3]
https://www.postgresql.org/message-id/OS0PR01MB571652CCD42F1D08D5BD69D494B3A%40OS0PR01MB5716.jpnprd01.prod.outlook.com
[4] https://www.postgresql.org/message-id/64056e35-1916-461c-a816-26e40ffde3a0%40gmail.com

Best Regards,
Hou zj

Attachment

Re: Synchronizing slots from primary to standby

From

"Drouvot, Bertrand"

Date:

14 November 2023, 14:26:49

Hi,

On 11/13/23 2:57 PM, Zhijie Hou (Fujitsu) wrote:
> On Friday, November 10, 2023 4:16 PM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote:
>> Yeah good point, agree to just error out in all the case then (if we discard the
>> sync_ reserved wording proposal, which seems to be the case as probably not
>> worth the extra work).
> 
> Thanks for the discussion!
> 
> Here is the V33 patch set which includes the following changes:

Thanks for working on it!

> 
> 1) Drop slots with state 'i' in promotion flow after we shut down WalReceiver.

@@ -3557,10 +3558,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
                      * this only after failure, so when you promote, we still
                      * finish replaying as much as we can from archive and
                      * pg_wal before failover.
+                    *
+                    * Drop the slots for which sync is initiated but not yet
+                    * completed i.e. they are still waiting for the primary
+                    * server to catch up.
                      */
                     if (StandbyMode && CheckForStandbyTrigger())
                     {
                         XLogShutdownWalRcv();
+                       slotsync_drop_initiated_slots();
                         return XLREAD_FAIL;
                     }

I had a closer look and it seems this is not located at the right place.

Indeed, it's added here:

switch (currentSource)
{
    case XLOG_FROM_ARCHIVE:
    case XLOG_FROM_PG_WAL:

While in our case we are in

    case XLOG_FROM_STREAM:

So I think we should move slotsync_drop_initiated_slots() in the
XLOG_FROM_STREAM case. Maybe before shutting down the sync slot worker?
(the TODO item number 2 you mentioned up-thread)

BTW in order to prevent any corner case, would'nt also be better to

replace:

+   /*
+    * Do not allow consumption of a "synchronized" slot until the standby
+    * gets promoted.
+    */
+   if (RecoveryInProgress() && (slot->data.sync_state != SYNCSLOT_STATE_NONE))

with something like:

if ((RecoveryInProgress() && (slot->data.sync_state != SYNCSLOT_STATE_NONE)) || slot->data.sync_state ==
SYNCSLOT_STATE_INITIATED)

to ensure slots in 'i' case can never be used?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

15 November 2023, 11:51:14

On Tue, Nov 14, 2023 at 7:56 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On 11/13/23 2:57 PM, Zhijie Hou (Fujitsu) wrote:
> > On Friday, November 10, 2023 4:16 PM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote:
> >> Yeah good point, agree to just error out in all the case then (if we discard the
> >> sync_ reserved wording proposal, which seems to be the case as probably not
> >> worth the extra work).
> >
> > Thanks for the discussion!
> >
> > Here is the V33 patch set which includes the following changes:
>
> Thanks for working on it!
>
> >
> > 1) Drop slots with state 'i' in promotion flow after we shut down WalReceiver.
>
> @@ -3557,10 +3558,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
>                       * this only after failure, so when you promote, we still
>                       * finish replaying as much as we can from archive and
>                       * pg_wal before failover.
> +                    *
> +                    * Drop the slots for which sync is initiated but not yet
> +                    * completed i.e. they are still waiting for the primary
> +                    * server to catch up.
>                       */
>                      if (StandbyMode && CheckForStandbyTrigger())
>                      {
>                          XLogShutdownWalRcv();
> +                       slotsync_drop_initiated_slots();
>                          return XLREAD_FAIL;
>                      }
>
> I had a closer look and it seems this is not located at the right place.
>
> Indeed, it's added here:
>
> switch (currentSource)
> {
>         case XLOG_FROM_ARCHIVE:
>         case XLOG_FROM_PG_WAL:
>
> While in our case we are in
>
>         case XLOG_FROM_STREAM:
>
> So I think we should move slotsync_drop_initiated_slots() in the
> XLOG_FROM_STREAM case. Maybe before shutting down the sync slot worker?
> (the TODO item number 2 you mentioned up-thread)
>
> BTW in order to prevent any corner case, would'nt also be better to
>
> replace:
>
> +   /*
> +    * Do not allow consumption of a "synchronized" slot until the standby
> +    * gets promoted.
> +    */
> +   if (RecoveryInProgress() && (slot->data.sync_state != SYNCSLOT_STATE_NONE))
>
> with something like:
>
> if ((RecoveryInProgress() && (slot->data.sync_state != SYNCSLOT_STATE_NONE)) || slot->data.sync_state ==
SYNCSLOT_STATE_INITIATED)
>
> to ensure slots in 'i' case can never be used?
>
> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com


PFA v34. It has changed patch002 from multi workers to single worker
design as per the discussion in [1] and [2].

Please note that the TODO list mentioned in [3] is still pending and
will be implemented in next version.

[1]: https://www.postgresql.org/message-id/CAA4eK1JzYoHu2r%3D%2BKwn%2BN4ZgVcWKtdX_yLSNyTqjdWGkr-q0iA%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/e7b63103-2a8c-4ee9-866a-ddba45ead388%40gmail.com
[3]:
https://www.postgresql.org/message-id/OS0PR01MB5716CE0729CEB3B5994A954194B3A%40OS0PR01MB5716.jpnprd01.prod.outlook.com

thanks
Shveta

On Thu, Nov 16, 2023 at 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 15, 2023 at 5:21 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > PFA v34.
> >
>
> Few comments on v34-0001*
> =======================
> 1.
> + char buf[100];
> +
> + buf[0] = '\0';
> +
> + if (MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING)
> + strcat(buf, "twophase");
> + if (MySubscription->failoverstate == LOGICALREP_FAILOVER_STATE_PENDING)
> + {
> + if (buf[0] != '\0')
> + strcat(buf, " and ");
> + strcat(buf, "failover");
> + }
> +
>   ereport(LOG,
> - (errmsg("logical replication apply worker for subscription \"%s\"
> will restart so that two_phase can be enabled",
> - MySubscription->name)));
> + (errmsg("logical replication apply worker for subscription \"%s\"
> will restart so that %s can be enabled",
> + MySubscription->name, buf)));
>
> I feel it is better to separate elogs rather than construct the
> string. It would be easier for the translation.
>
> 2.
> -
>  /* Initialize walsender process before entering the main command loop */
>
> Spurious line removal
>
> 3.
> @@ -440,17 +448,8 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
>
>   if (startlsn < moveto)
>   {
> - SpinLockAcquire(&MyReplicationSlot->mutex);
> - MyReplicationSlot->data.restart_lsn = moveto;
> - SpinLockRelease(&MyReplicationSlot->mutex);
> + PhysicalConfirmReceivedLocation(moveto);
>   retlsn = moveto;
> -
> - /*
> - * Dirty the slot so as it is written out at the next checkpoint. Note
> - * that the LSN position advanced may still be lost in the event of a
> - * crash, but this makes the data consistent after a clean shutdown.
> - */
> - ReplicationSlotMarkDirty();
>   }
>
> I think this change has been made so that we can wakeup logical
> walsenders from a central location. In general, this is a good idea
> but it seems calling PhysicalConfirmReceivedLocation() would make an
> additional call to ReplicationSlotsComputeRequiredLSN() which is
> already called in the caller of
> pg_physical_replication_slot_advance(), so not sure such unification
> is a good idea here.
>
> 4.
> + * Here logical walsender associated with failover logical slot waits
> + * for physical standbys corresponding to physical slots specified in
> + * standby_slot_names GUC.
> + */
> +void
> +WalSndWaitForStandbyConfirmation(XLogRecPtr wait_for_lsn)
>
> In the above comments, we don't seem to follow the 80-col limit.
> Please check all other comments in the patch for similar problem.
>
> 5.
> +static void
> +WalSndRereadConfigAndSlots(List **standby_slots)
> +{
> + char    *pre_standby_slot_names = pstrdup(standby_slot_names);
> +
> + ProcessConfigFile(PGC_SIGHUP);
> +
> + if (strcmp(pre_standby_slot_names, standby_slot_names) != 0)
> + {
> + list_free(*standby_slots);
> + *standby_slots = GetStandbySlotList(true);
> + }
> +
> + pfree(pre_standby_slot_names);
> +}
>
> The function name is misleading w.r.t the functionality. Can we name
> it on the lines of WalSndRereadConfigAndReInitSlotList()? I know it is
> a bit longer but couldn't come up with anything better.
>
> 6.
> + /*
> + * Fast path to entering the loop in case we already know we have
> + * enough WAL available and all the standby servers has confirmed
> + * receipt of WAL upto RecentFlushPtr.
>
> I think this comment is a bit misleading because it is a fast path to
> avoid entering the loop. I think we can keep the existing comment
> here: "Fast path to avoid acquiring the spinlock in case we already
> know ..."
>
> 7.
> @@ -3381,7 +3673,9 @@ WalSndWait(uint32 socket_events, long timeout,
> uint32 wait_event)
>   * And, we use separate shared memory CVs for physical and logical
>   * walsenders for selective wake ups, see WalSndWakeup() for more details.
>   */
> - if (MyWalSnd->kind == REPLICATION_KIND_PHYSICAL)
> + if (wait_for_standby)
> + ConditionVariablePrepareToSleep(&WalSndCtl->wal_confirm_rcv_cv);
> + else if (MyWalSnd->kind == REPLICATION_KIND_PHYSICAL)
>
> The comment above this change needs to be updated for the usage of this new CV.
>
> 8.
> +WAL_SENDER_WAIT_FOR_STANDBY_CONFIRMATION "Waiting for physical
> standby confirmation in WAL sender process."
>
> I feel the above description is not clear. How about being more
> specific with something along the lines of: "Waiting for the WAL to be
> received by physical standby in WAL sender process."
>
> 9.
> + {"standby_slot_names", PGC_SIGHUP, REPLICATION_PRIMARY,
> + gettext_noop("List of streaming replication standby server slot "
> + "names that logical walsenders waits for."),
>
> I think we slightly simplify it by saying: "Lists streaming
> replication standby server slot names that logical WAL sender
> processes wait for.". It would be more consistent with a few other
> similar variables.
>
> 10.
> + gettext_noop("List of streaming replication standby server slot "
> + "names that logical walsenders waits for."),
> + gettext_noop("Decoded changes are sent out to plugins by logical "
> + "walsenders only after specified replication slots "
> + "confirm receiving WAL."),
>
> Instead of walsenders, let's use WAL sender processes.
>
> 11.
> @@ -6622,10 +6623,12 @@ describeSubscriptions(const char *pattern, bool verbose)
>   appendPQExpBuffer(&buf,
>     ", suborigin AS \"%s\"\n"
>     ", subpasswordrequired AS \"%s\"\n"
> -   ", subrunasowner AS \"%s\"\n",
> +   ", subrunasowner AS \"%s\"\n"
> +   ", subfailoverstate AS \"%s\"\n",
>     gettext_noop("Origin"),
>     gettext_noop("Password required"),
> -   gettext_noop("Run as owner?"));
> +   gettext_noop("Run as owner?"),
> +   gettext_noop("Enable failover?"));
>
> Let's name the new column as "Failover" and also it should be
> displayed only when pset.sversion is >=17.
>
> 12.
> @@ -93,6 +97,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
>   bool subrunasowner; /* True if replication should execute as the
>   * subscription owner */
>
> + char subfailoverstate; /* Enable Failover State */
>
> This should be listed in system_views.sql in the below GRANT statement:
> GRANT SELECT (oid, subdbid, subskiplsn, subname, subowner, subenabled,
>               subbinary, substream, subtwophasestate, subdisableonerr,
>   subpasswordrequired, subrunasowner,
>               subslotname, subsynccommit, subpublications, suborigin)
>
> 13.
> + ConditionVariable wal_confirm_rcv_cv;
> +
>   WalSnd walsnds[FLEXIBLE_ARRAY_MEMBER];
>  } WalSndCtlData;
>
> It is better to add a comment for this new variable explaining its use.
>
> --
> With Regards,
> Amit Kapila.


PFA v35. It has below changes:

1) change of default for 'enable_syncslot' to false.
2) validate the dbname provided in primary_conninfo before attempting
slot-sync.
3) do not allow logical decoding on slots with 'i' sync_state.
4) support in pg_upgrade for the failover property of slot.
5) do not start slot-sync if wal_level < logical
6) shutdown the slotsync worker on promotion.

Thanks Ajin for working on 4 and 5. Thanks Hou-San for working on 6.

The changes are in patch001 and patch002.

With above changes, comments in [1] and [2] are addressed

TODO:
1) Comments in [3].
2) Analyze if we need to consider supporting an upgrade of the slot's
'sync_state' property?

[1]:
https://www.postgresql.org/message-id/OS0PR01MB571652CCD42F1D08D5BD69D494B3A%40OS0PR01MB5716.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/message-id/46070646-9e09-4566-8a62-ae31a12a510c%40gmail.com
[3]: https://www.postgresql.org/message-id/CAA4eK1J%3D-kPHS1eHNBtzOQHZ64j6WSgSYQZ3fH%3D2vfiwy_48AA%40mail.gmail.com

thanks
Shveta

On Thursday, November 16, 2023 6:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Wed, Nov 15, 2023 at 5:21 PM shveta malik <shveta.malik@gmail.com>
> wrote:
> >
> > PFA v34.
> >
> 
> Few comments on v34-0001*
> =======================
> 1.
> + char buf[100];
> +
> + buf[0] = '\0';
> +
> + if (MySubscription->twophasestate ==
> + LOGICALREP_TWOPHASE_STATE_PENDING)
> + strcat(buf, "twophase");
> + if (MySubscription->failoverstate ==
> + LOGICALREP_FAILOVER_STATE_PENDING)
> + {
> + if (buf[0] != '\0')
> + strcat(buf, " and ");
> + strcat(buf, "failover");
> + }
> +
>   ereport(LOG,
> - (errmsg("logical replication apply worker for subscription \"%s\"
> will restart so that two_phase can be enabled",
> - MySubscription->name)));
> + (errmsg("logical replication apply worker for subscription \"%s\"
> will restart so that %s can be enabled",
> + MySubscription->name, buf)));
> 
> I feel it is better to separate elogs rather than construct the string. It would be
> easier for the translation.
> 
> 2.
> -
>  /* Initialize walsender process before entering the main command loop */
> 
> Spurious line removal
> 
> 3.
> @@ -440,17 +448,8 @@ pg_physical_replication_slot_advance(XLogRecPtr
> moveto)
> 
>   if (startlsn < moveto)
>   {
> - SpinLockAcquire(&MyReplicationSlot->mutex);
> - MyReplicationSlot->data.restart_lsn = moveto;
> - SpinLockRelease(&MyReplicationSlot->mutex);
> + PhysicalConfirmReceivedLocation(moveto);
>   retlsn = moveto;
> -
> - /*
> - * Dirty the slot so as it is written out at the next checkpoint. Note
> - * that the LSN position advanced may still be lost in the event of a
> - * crash, but this makes the data consistent after a clean shutdown.
> - */
> - ReplicationSlotMarkDirty();
>   }
> 
> I think this change has been made so that we can wakeup logical walsenders
> from a central location. In general, this is a good idea but it seems calling
> PhysicalConfirmReceivedLocation() would make an additional call to
> ReplicationSlotsComputeRequiredLSN() which is already called in the caller of
> pg_physical_replication_slot_advance(), so not sure such unification is a good
> idea here.
> 
> 4.
> + * Here logical walsender associated with failover logical slot waits
> + * for physical standbys corresponding to physical slots specified in
> + * standby_slot_names GUC.
> + */
> +void
> +WalSndWaitForStandbyConfirmation(XLogRecPtr wait_for_lsn)
> 
> In the above comments, we don't seem to follow the 80-col limit.
> Please check all other comments in the patch for similar problem.
> 
> 5.
> +static void
> +WalSndRereadConfigAndSlots(List **standby_slots) {
> + char    *pre_standby_slot_names = pstrdup(standby_slot_names);
> +
> + ProcessConfigFile(PGC_SIGHUP);
> +
> + if (strcmp(pre_standby_slot_names, standby_slot_names) != 0) {
> + list_free(*standby_slots); *standby_slots = GetStandbySlotList(true);
> + }
> +
> + pfree(pre_standby_slot_names);
> +}
> 
> The function name is misleading w.r.t the functionality. Can we name it on the
> lines of WalSndRereadConfigAndReInitSlotList()? I know it is a bit longer but
> couldn't come up with anything better.
> 
> 6.
> + /*
> + * Fast path to entering the loop in case we already know we have
> + * enough WAL available and all the standby servers has confirmed
> + * receipt of WAL upto RecentFlushPtr.
> 
> I think this comment is a bit misleading because it is a fast path to avoid
> entering the loop. I think we can keep the existing comment
> here: "Fast path to avoid acquiring the spinlock in case we already know ..."
> 
> 7.
> @@ -3381,7 +3673,9 @@ WalSndWait(uint32 socket_events, long timeout,
> uint32 wait_event)
>   * And, we use separate shared memory CVs for physical and logical
>   * walsenders for selective wake ups, see WalSndWakeup() for more details.
>   */
> - if (MyWalSnd->kind == REPLICATION_KIND_PHYSICAL)
> + if (wait_for_standby)
> + ConditionVariablePrepareToSleep(&WalSndCtl->wal_confirm_rcv_cv);
> + else if (MyWalSnd->kind == REPLICATION_KIND_PHYSICAL)
> 
> The comment above this change needs to be updated for the usage of this new
> CV.
> 
> 8.
> +WAL_SENDER_WAIT_FOR_STANDBY_CONFIRMATION "Waiting for physical
> standby confirmation in WAL sender process."
> 
> I feel the above description is not clear. How about being more specific with
> something along the lines of: "Waiting for the WAL to be received by physical
> standby in WAL sender process."
> 
> 9.
> + {"standby_slot_names", PGC_SIGHUP, REPLICATION_PRIMARY,
> + gettext_noop("List of streaming replication standby server slot "
> + "names that logical walsenders waits for."),
> 
> I think we slightly simplify it by saying: "Lists streaming replication standby
> server slot names that logical WAL sender processes wait for.". It would be
> more consistent with a few other similar variables.
> 
> 10.
> + gettext_noop("List of streaming replication standby server slot "
> + "names that logical walsenders waits for."), gettext_noop("Decoded
> + changes are sent out to plugins by logical "
> + "walsenders only after specified replication slots "
> + "confirm receiving WAL."),
> 
> Instead of walsenders, let's use WAL sender processes.
> 
> 11.
> @@ -6622,10 +6623,12 @@ describeSubscriptions(const char *pattern, bool
> verbose)
>   appendPQExpBuffer(&buf,
>     ", suborigin AS \"%s\"\n"
>     ", subpasswordrequired AS \"%s\"\n"
> -   ", subrunasowner AS \"%s\"\n",
> +   ", subrunasowner AS \"%s\"\n"
> +   ", subfailoverstate AS \"%s\"\n",
>     gettext_noop("Origin"),
>     gettext_noop("Password required"),
> -   gettext_noop("Run as owner?"));
> +   gettext_noop("Run as owner?"),
> +   gettext_noop("Enable failover?"));
> 
> Let's name the new column as "Failover" and also it should be displayed only
> when pset.sversion is >=17.
> 
> 12.
> @@ -93,6 +97,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
>   bool subrunasowner; /* True if replication should execute as the
>   * subscription owner */
> 
> + char subfailoverstate; /* Enable Failover State */
> 
> This should be listed in system_views.sql in the below GRANT statement:
> GRANT SELECT (oid, subdbid, subskiplsn, subname, subowner, subenabled,
>               subbinary, substream, subtwophasestate, subdisableonerr,
>   subpasswordrequired, subrunasowner,
>               subslotname, subsynccommit, subpublications, suborigin)
> 
> 13.
> + ConditionVariable wal_confirm_rcv_cv;
> +
>   WalSnd walsnds[FLEXIBLE_ARRAY_MEMBER];  } WalSndCtlData;
> 
> It is better to add a comment for this new variable explaining its use.

Thanks for the comments.
Here is the new version patch set which addressed all above comments and the comment in [1].

[1] https://www.postgresql.org/message-id/1e0b2eb4-c977-482d-b16e-c52711c34d6c%40gmail.com

Best Regards,
Hou zj

On Saturday, November 18, 2023 6:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Fri, Nov 17, 2023 at 5:18 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On 11/17/23 2:46 AM, Zhijie Hou (Fujitsu) wrote:
> > > On Tuesday, November 14, 2023 10:27 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > I feel the WaitForWALToBecomeAvailable may not be the best place to
> > > shutdown slotsync worker and drop slots. There could be other
> > > reasons(other than
> > > promotion) as mentioned in comments in case XLOG_FROM_STREAM to
> > > reach the code there. I thought if the intention is to stop slotsync
> > > workers on promotion, maybe FinishWalRecovery() is a better place to
> > > do it as it's indicating the end of recovery and XLogShutdownWalRcv is also
> called in it.
> >
> > I can see that slotsync_drop_initiated_slots() has been moved in
> > FinishWalRecovery() in v35. That looks ok.
> > >
> 
> 
> More Review for v35-0002*

Thanks for the comments.

> ============================
> 1.
> + ereport(WARNING,
> + errmsg("skipping slots synchronization as primary_slot_name "
> +    "is not set."));
> 
> There is no need to use a full stop at the end for WARNING messages and as
> previously mentioned, let's not split message lines in such cases. There are
> other messages in the patch with similar problems, please fix those as well.

Adjusted.

> 
> 2.
> +slotsync_checks()
> {
> ...
> ...
> + /* The hot_standby_feedback must be ON for slot-sync to work */ if
> + (!hot_standby_feedback) { ereport(WARNING, errmsg("skipping slots
> + synchronization as hot_standby_feedback "
> +    "is off."));
> 
> This message has the same problem as mentioned in the previous comment.
> Additionally, I think either atop slotsync_checks or along with GUC check we
> should write comments as to why we expect these values to be set for slot sync
> to work.

Added comments for these cases.

> 
> 3.
> + /* The worker is running already */
> + if (SlotSyncWorker &&SlotSyncWorker->hdr.in_use &&
> + SlotSyncWorker->hdr.proc)
> 
> The spacing for both the &&'s has problems. You need a space after the first
> && and the second && should be in the prior line.

Adjusted.

> 
> 4.
> + LauncherRereadConfig(&recheck_slotsync);
> +
>   }
> 
> An empty line after LauncherRereadConfig() is not required.
> 
> 5.
> +static void
> +LauncherRereadConfig(bool *ss_recheck)
> +{
> + char    *conninfo = pstrdup(PrimaryConnInfo);
> + char    *slotname = pstrdup(PrimarySlotName);
> + bool syncslot = enable_syncslot;
> + bool feedback = hot_standby_feedback;
> 
> Can we change the variable name 'feedback' to 'standbyfeedback' to make it
> slightly more descriptive?

Changed.

> 
> 6. The logic to recheck the slot_sync related parameters in
> LauncherMain() is not very clear. IIUC, if after reload config any parameter is
> changed, we just seem to be checking the validity of the changed parameter
> but not restarting the slot sync worker, is that correct? If so, what if dbname is
> changed, don't we need to restart the slot-sync worker and re-initialize the
> connection; similarly slotname change also needs some thoughts. Also, if all the
> parameters are valid we seem to be re-launching the slot-sync worker without
> first stopping it which doesn't seem correct, am I missing something in this
> logic?

I think the slot sync worker will be stopped in LauncherRereadConfig() if GUC changed
and new slot sync worker will be started in next loop in LauncherMain().


> 7.
> @@ -524,6 +525,25 @@ CreateDecodingContext(XLogRecPtr start_lsn,
>   errmsg("replication slot \"%s\" was not created in this database",
>   NameStr(slot->data.name))));
> 
> + in_recovery = RecoveryInProgress();
> +
> + /*
> + * Do not allow consumption of a "synchronized" slot until the standby
> + * gets promoted. Also do not allow consumption of slots with
> + sync_state
> + * as SYNCSLOT_STATE_INITIATED as they are not synced completely to be
> + * used.
> + */
> + if ((in_recovery && (slot->data.sync_state != SYNCSLOT_STATE_NONE)) ||
> + slot->data.sync_state == SYNCSLOT_STATE_INITIATED)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot use replication slot \"%s\" for logical decoding",
> + NameStr(slot->data.name)), in_recovery ?
> + errdetail("This slot is being synced from the primary server.") :
> + errdetail("This slot was not synced completely from the primary
> + server."), errhint("Specify another replication slot.")));
> +
> 
> If we are planning to drop slots in state SYNCSLOT_STATE_INITIATED at the
> time of promotion, don't we need to just have an assert or elog(ERROR, .. for
> non-recovery cases as such cases won't be reachable? If so, I think we can
> separate out that case here.

Adjusted the codes as suggested.

> 
> 8.
> wait_for_primary_slot_catchup()
> {
> ...
> + /* Check if this standby is promoted while we are waiting */ if
> + (!RecoveryInProgress()) {
> + /*
> + * The remote slot didn't pass the locally reserved position at
> + * the time of local promotion, so it's not safe to use.
> + */
> + ereport(
> + WARNING,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg(
> + "slot-sync wait for slot %s interrupted by promotion, "
> + "slot creation aborted", remote_slot->name))); pfree(cmd.data); return
> + false; }
> ...
> }
> 
> Shouldn't this be an Assert because a slot-sync worker shouldn't exist for
> non-standby servers?

Changed to Assert.

> 
> 9.
> wait_for_primary_slot_catchup()
> {
> ...
> + slot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
> + if (!tuplestore_gettupleslot(res->tuplestore, true, false, slot)) {
> + ereport(WARNING, (errmsg("slot \"%s\" disappeared from the primary
> + server,"
> + " slot creation aborted", remote_slot->name))); pfree(cmd.data);
> + walrcv_clear_result(res); return false;
> 
> If the slot on primary disappears, shouldn't this part of the code somehow
> ensure to remove the slot on standby as well? If it is taken at some other point
> in time then at least we should write a comment here to state how it is taken
> care of. I think this comment also applies to a few other checks following this
> check.

I adjusted the code here to not persist the slots if the slot disappeared or invalidated
on primary, so that the local slot will get dropped when releasing.

> 
> 10.
> + /*
> + * It is possible to get null values for lsns and xmin if slot is
> + * invalidated on the primary server, so handle accordingly.
> + */
> + new_invalidated = DatumGetBool(slot_getattr(slot, 1, &isnull));
> 
> We can say LSN and Xmin in the above comment to make it easier to
> read/understand.

Changed.

> 
> 11.
> /*
> + * Once we got valid restart_lsn, then confirmed_lsn and catalog_xmin
> + * are expected to be valid/non-null, so assert if found null.
> + */
> 
> No need to explicitly say about assert, it is clear from the code. We can slightly
> change this comment to: "Once we got valid restart_lsn, then confirmed_lsn
> and catalog_xmin are expected to be valid/non-null."

Changed.

> 
> 12.
> + if (remote_slot->restart_lsn < MyReplicationSlot->data.restart_lsn ||
> + TransactionIdPrecedes(remote_slot->catalog_xmin,
> +   MyReplicationSlot->data.catalog_xmin))
> + {
> + if (!wait_for_primary_slot_catchup(wrconn, remote_slot)) {
> + /*
> + * The remote slot didn't catch up to locally reserved position.
> + * But still persist it and attempt the wait and sync in next
> + * sync-cycle.
> + */
> + if (MyReplicationSlot->data.persistency != RS_PERSISTENT) {
> + ReplicationSlotPersist(); *slot_updated = true; }
> 
> I think the reason to persist in this case is because next time local restart_lsn can
> be ahead than the current location and it can take more time to create such a
> slot. We can probably mention the same in the comments.

Updated the comments.

Here is the V37 patch set which addressed comments above and [1].

[1] https://www.postgresql.org/message-id/CAA4eK1%2BP9R3GO2rwGBg2EOh%3DuYjWUSEOHD8yvs4Je8WYa2RHag%40mail.gmail.com

Best Regards,
Hou zj

On Tue, Nov 21, 2023 at 2:02 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Nov 21, 2023 at 1:13 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On 11/21/23 6:16 AM, Amit Kapila wrote:
> > > On Mon, Nov 20, 2023 at 6:51 PM Drouvot, Bertrand
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > >> As far the 'i' state here, from what I see, it is currently useful for:
> > >>
> > >> 1. Cascading standby to not sync slots with state = 'i' from
> > >> the first standby.
> > >> 2. Easily report Slots that did not catch up on the primary yet.
> > >> 3. Avoid inactive slots to block "active" ones creation.
> > >>
> > >> So not creating those slots should not be an issue for 1. (sync are
> > >> not needed on cascading standby as not created on the first standby yet)
> > >> but is an issue for 2. (unless we provide another way to keep track and report
> > >> such slots) and 3. (as I think we should still need to reserve WAL).
> > >>
> > >> I've a question: we'd still need to reserve WAL for those slots, no?
> > >>
> > >> If that's the case and if we don't call ReplicationSlotCreate() then ReplicationSlotReserveWal()
> > >> would not work as  MyReplicationSlot would be NULL.
> > >>
> > >
> > > Yes, we need to reserve WAL to see if we can sync the slot. We are
> > > currently creating an RS_EPHEMERAL slot and if we don't explicitly
> > > persist it when we can't sync, then it will be dropped when we do
> > > ReplicationSlotRelease() at the end of synchronize_one_slot(). So, the
> > > loss is probably, the next time we again try to sync the slot, we need
> > > to again create it and may need to wait for newer restart_lsn on
> > > standby
> >
> > Yeah, and doing so we'd reduce the time window to give the slot a chance
> > to catch up (as opposed to create it a single time and maintain an 'i' state).
> >
> > > which could be avoided if we have the slot in 'i' state from
> > > the previous run.
> >
> > Right.
> >
> > > I don't deny the importance of having 'i'
> > > (initialized) state but was just trying to say that it has additional
> > > code complexity.
> >
> > Right, and I think it's worth it.
> >
> > > OTOH, having it may give better visibility to even
> > > users about slots that are not active (say manually created slots on
> > > the primary).
> >
> > Agree.
> >
> > All that being said, on my side I'm +1 on keeping the 'i' state behavior
> > as it is implemented currently (would be happy to hear others' opinions too).
> >
>
> +1 for 'i' state. I feel it gives a better slot-sync functionality
> (optimizing redo-effort for inactive slots, inactive not blocking
> active ones) along with its usage for monitoring purposes.


v37 fails to apply to HEAD due to a recent commit e83aa9f92fdd,
rebased the patches.  PFA v37_2 patches.

thanks
Shveta

On Mon, Nov 20, 2023 at 4:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Nov 18, 2023 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Nov 17, 2023 at 5:18 PM Drouvot, Bertrand
> > <bertranddrouvot.pg@gmail.com> wrote:
> >
> > More Review for v35-0002*
> > ============================
> >
>

Thanks for the feedback. Please find the patch attached and my comments inline.

> More review of v35-0002*
> ====================
> 1.
> +/*
> + * Helper function to check if local_slot is present in remote_slots list.
> + *
> + * It also checks if logical slot is locally invalidated i.e. invalidated on
> + * the standby but valid on the primary server. If found so, it sets
> + * locally_invalidated to true.
> + */
> +static bool
> +slot_exists_in_list(ReplicationSlot *local_slot, List *remote_slots,
> + bool *locally_invalidated)
>
> The name of the function is a bit misleading because it checks the
> validity of the slot not only whether it exists in remote_list. Would
> it be better to name it as ValidateSyncSlot() or something along those
> lines?
>

Sure, updated the name.

> 2.
> +static long
> +synchronize_slots(WalReceiverConn *wrconn)
> {
> ...
> + /* Construct query to get slots info from the primary server */
> + initStringInfo(&s);
> + construct_slot_query(&s);
> ...
> + if (remote_slot->conflicting)
> + remote_slot->invalidated = get_remote_invalidation_cause(wrconn,
> + remote_slot->name);
> ...
>
> +static ReplicationSlotInvalidationCause
> +get_remote_invalidation_cause(WalReceiverConn *wrconn, char *slot_name)
> {
> ...
> + appendStringInfo(&cmd,
> + "SELECT pg_get_slot_invalidation_cause(%s)",
> + quote_literal_cstr(slot_name));
> + res = walrcv_exec(wrconn, cmd.data, 1, slotRow);
>
> Do we really need to query a second time to get the invalidation
> cause? Can we adjust the slot_query to get it in one round trip? I
> think this may not optimize much because the patch uses second round
> trip only for invalidated slots but still looks odd. So unless the
> query becomes too complicated, we should try to achive it one round
> trip.
>

Modified the query to fetch all the info at once.

> 3.
> +static long
> +synchronize_slots(WalReceiverConn *wrconn)
> +{
> ...
> ...
> + /* The syscache access needs a transaction env. */
> + StartTransactionCommand();
> +
> + /* Make things live outside TX context */
> + MemoryContextSwitchTo(oldctx);
> +
> + /* Construct query to get slots info from the primary server */
> + initStringInfo(&s);
> + construct_slot_query(&s);
> +
> + elog(DEBUG2, "slot-sync worker's query:%s \n", s.data);
> +
> + /* Execute the query */
> + res = walrcv_exec(wrconn, s.data, SLOTSYNC_COLUMN_COUNT, slotRow);
>
> It is okay to perform the above query execution outside the
> transaction context but I would like to know the reason for the same.
> Do we want to retain anything beyond the transaction context or is
> there some other reason to do this outside the transaction context?
>

Modified the comment with the reason. We need to start a transaction
for syscache access. We can end it as soon as walrcv_exec() is over,
but we need the tuple-results to be accessed even after that, thus
those should not be allocated in TopTransactionContext.

> 4.
> +static void
> +construct_slot_query(StringInfo s)
> +{
> + /*
> + * Fetch data for logical failover slots with sync_state either as
> + * SYNCSLOT_STATE_NONE or SYNCSLOT_STATE_READY.
> + */
> + appendStringInfo(s,
> + "SELECT slot_name, plugin, confirmed_flush_lsn,"
> + " restart_lsn, catalog_xmin, two_phase, conflicting, "
> + " database FROM pg_catalog.pg_replication_slots"
> + " WHERE failover and sync_state != 'i'");
> +}
>
> Why would the sync_state on the primary server be any valid value? I
> thought it was set only on physical standby. I think it is better to
> mention the reason for using the sync state and or failover flag in
> the above comments. The current comment doesn't seem of much use as it
> just states what is evident from the query.

Updated the reason in comment. It is mainly for cascading standby to
fetch correct slots.

>
> 5.
> * This check should never pass as on the primary server, we have waited
> + * for the standby's confirmation before updating the logical slot. But to
> + * take care of any bug in that flow, we should retain this check.
> + */
> + if (remote_slot->confirmed_lsn > WalRcv->latestWalEnd)
> + {
> + elog(LOG, "skipping sync of slot \"%s\" as the received slot-sync "
> + "LSN %X/%X is ahead of the standby position %X/%X",
> + remote_slot->name,
> + LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
> + LSN_FORMAT_ARGS(WalRcv->latestWalEnd));
> +
>
> This should be elog(ERROR, ..). Normally, we use elog(ERROR, ...) for
> such unexpected cases. And, you don't need to explicitly mention the
> last sentence in the comment: "But to take care of any bug in that
> flow, we should retain this check.".
>

Sure, modified.

> 6.
> +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot,
> + bool *slot_updated)
> {
> ...
> + if (remote_slot->restart_lsn < MyReplicationSlot->data.restart_lsn)
> + {
> + ereport(WARNING,
> + errmsg("not synchronizing slot %s; synchronization would"
> +    " move it backwards", remote_slot->name));
>
> I think here elevel should be LOG because user can't do much about
> this. Do we use ';' at other places in the message? But when can we
> hit this case? We can add some comments to state in which scenario
> this possible. OTOH, if this is sort of can't happen case and we have
> kept it to avoid any sort of inconsistency then we can probably use
> elog(ERROR, .. with approapriate LSN locations, so that later the
> problem could be debugged.
>

Converted to ERROR and updated comment

> 7.
> +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot,
> + bool *slot_updated)
> {
> ...
> +
> + StartTransactionCommand();
> +
> + /* Make things live outside TX context */
> + MemoryContextSwitchTo(oldctx);
> +
> ...
>
> Similar to one of the previous comments, it is not clear to me why the
> patch is doing a memory context switch here. Can we add a comment?
>

I have removed the memory-context-switch here as the results are all
consumed within the span of transaction, so we do not need to retain
those even after commit of txn for this particular case.

> 8.
> + /* User created slot with the same name exists, raise ERROR. */
> + else if (sync_state == SYNCSLOT_STATE_NONE)
> + {
> + ereport(ERROR,
> + errmsg("not synchronizing slot %s; it is a user created slot",
> +    remote_slot->name));
> + }
>
> Won't we need error_code in this error? Also, the message doesn't seem
> to follow the code's usual style.

Modified. I have added errdetail as well, but not sure what we can add
as error-hint, Shall we add something like: Try renaming existing
slot.

>
> 9.
> +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot,
> + bool *slot_updated)
> {
> ...
> + else
> + {
> + TransactionId xmin_horizon = InvalidTransactionId;
> + ReplicationSlot *slot;
> +
> + ReplicationSlotCreate(remote_slot->name, true, RS_EPHEMERAL,
> +   remote_slot->two_phase, false);
> + slot = MyReplicationSlot;
> +
> + SpinLockAcquire(&slot->mutex);
> + slot->data.database = get_database_oid(remote_slot->database, false);
> +
> + /* Mark it as sync initiated by slot-sync worker */
> + slot->data.sync_state = SYNCSLOT_STATE_INITIATED;
> + slot->data.failover = true;
> +
> + namestrcpy(&slot->data.plugin, remote_slot->plugin);
> + SpinLockRelease(&slot->mutex);
> +
> + ReplicationSlotReserveWal();
> +
>
> How and when will this init state (SYNCSLOT_STATE_INITIATED) persist to disk?

This will be inside wait_for_primary_and_sync. I have reorganized code
here (removed wait_for_primary_and_sync) to make it more readable.

>
> 10.
> + if (slot_updated)
> + SlotSyncWorker->last_update_time = now;
> +
> + else if (TimestampDifferenceExceeds(SlotSyncWorker->last_update_time,
> + now, WORKER_INACTIVITY_THRESHOLD_MS))
>
> Empty line between if/else if is not required.
>

This is added by pg_indent. Not sure how we can correct it.

> 11.
> +static WalReceiverConn *
> +remote_connect()
> +{
> + WalReceiverConn *wrconn = NULL;
> + char    *err;
> +
> + wrconn = walrcv_connect(PrimaryConnInfo, true, false, "slot-sync", &err);
> + if (wrconn == NULL)
> + ereport(ERROR,
> + (errmsg("could not connect to the primary server: %s", err)));
>
> Let's use appname similar to what we do for "walreceiver" as shown below:
> /* Establish the connection to the primary for XLOG streaming */
> wrconn = walrcv_connect(conninfo, false, false,
> cluster_name[0] ? cluster_name : "walreceiver",
> &err);
> if (!wrconn)
> ereport(ERROR,
> (errcode(ERRCODE_CONNECTION_FAILURE),
> errmsg("could not connect to the primary server: %s", err)));
>
> Some proposals for default appname "slotsynchronizer", "slotsync
> worker". Also, use the same error code as used by "walreceiver".

Modified.

>
> 12. Do we need the handling of the slotsync worker in
> GetBackendTypeDesc()? Please check without that what value this patch
> displays for backend_type.

It currently displays "slot sync worker'. It is the same desc which
launcher has launched this worker with (snprintf(bgw.bgw_type,
BGW_MAXLEN, "slot sync worker")).

postgres=# select backend_type from pg_stat_activity;
         backend_type
------------------------------
 logical replication launcher
 slot sync worker
.......

For slot sync and logical launcher, BackendType is B_BG_WORKER and
thus pg_stat_get_activity() for this type displays backend_type as the
one given during background process registration and thus we get these
correctly. But pg_stat_get_io() does not  have the same
implementation, it displays 'background worker' as the description. I
think slot-sync and logical launcher are one of these entries

postgres=# select backend_type from pg_stat_io;
    backend_type
---------------------
 autovacuum launcher
..
 background worker
 background worker
 background worker
 background worker
 background worker
 background writer
.....

>
> 13.
> +/*
> + * Re-read the config file.
> + *
> + * If primary_conninfo has changed, reconnect to primary.
> + */
> +static void
> +slotsync_reread_config(WalReceiverConn **wrconn)
> +{
> + char    *conninfo = pstrdup(PrimaryConnInfo);
> +
> + ConfigReloadPending = false;
> + ProcessConfigFile(PGC_SIGHUP);
> +
> + /* Reconnect if GUC primary_conninfo got changed */
> + if (strcmp(conninfo, PrimaryConnInfo) != 0)
> + {
> + if (*wrconn)
> + walrcv_disconnect(*wrconn);
> +
> + *wrconn = remote_connect();
>
> I think we should exit the worker in this case and allow it to
> reconnect. See the similar handling in maybe_reread_subscription().
> One effect of not doing is that the dbname patch has used in
> ReplSlotSyncWorkerMain() will become inconsistent.
>

Modified as suggested.

> 14.
> +void
> +ReplSlotSyncWorkerMain(Datum main_arg)
> +{
> ...
> ...
> + /*
> + * If the standby has been promoted, skip the slot synchronization process.
> + *
> + * Although the startup process stops all the slot-sync workers on
> + * promotion, the launcher may not have realized the promotion and could
> + * start additional workers after that. Therefore, this check is still
> + * necessary to prevent these additional workers from running.
> + */
> + if (PromoteIsTriggered())
> + exit(0);
> ...
> ...
> + /* Check if got promoted */
> + if (!RecoveryInProgress())
> + {
> + /*
> + * Drop the slots for which sync is initiated but not yet
> + * completed i.e. they are still waiting for the primary server to
> + * catch up.
> + */
> + slotsync_drop_initiated_slots();
> + ereport(LOG,
> + errmsg("exiting slot-sync woker on promotion of standby"));
>
> I think we should never reach this code in non-standby mode. It should
> elog(ERROR,.. Can you please explain why promotion handling is
> required here?

I will handle this in the next version. It needs some more thoughts,
especially on how 'PromoteIsTriggered' can be removed.

>
> 15.
> @@ -190,6 +190,8 @@ static const char *const BuiltinTrancheNames[] = {
>   "LogicalRepLauncherDSA",
>   /* LWTRANCHE_LAUNCHER_HASH: */
>   "LogicalRepLauncherHash",
> + /* LWTRANCHE_SLOTSYNC_DSA: */
> + "SlotSyncWorkerDSA",
>  };
> ...
> ...
> + LWTRANCHE_SLOTSYNC_DSA,
>   LWTRANCHE_FIRST_USER_DEFINED,
>  } BuiltinTrancheIds;
>
> These are not used in the patch.
>

Removed.

> 16.
> +/* -------------------------------
> + * LIST_DBID_FOR_FAILOVER_SLOTS command
> + * -------------------------------
> + */
> +typedef struct ListDBForFailoverSlotsCmd
> +{
> + NodeTag type;
> + List    *slot_names;
> +} ListDBForFailoverSlotsCmd;
>
> ...
>
> +/*
> + * Failover logical slots data received from remote.
> + */
> +typedef struct WalRcvFailoverSlotsData
> +{
> + Oid dboid;
> +} WalRcvFailoverSlotsData;
>
> These structures don't seem to be used in the current version of the patch.

Removed.

>
> 17.
> --- a/src/include/replication/slot.h
> +++ b/src/include/replication/slot.h
> @@ -15,7 +15,6 @@
>  #include "storage/lwlock.h"
>  #include "storage/shmem.h"
>  #include "storage/spin.h"
> -#include "replication/walreceiver.h"
> ...
> ...
> -extern void WaitForStandbyLSN(XLogRecPtr wait_for_lsn);
>  extern List *GetStandbySlotList(bool copy);
>
> Why the above two are removed as part of this patch?

WaitForStandbyLSN() is no longer there, so that is why it was removed.
I think it should have been removed from patch0001. WIll make this
change in the next version where we have pacth0001 changes coming.

Regarding header inclusion and 'ReplicationSlotDropAtPubNode' removal,
not sure when those were removed. But my best guess is that the header
inclusion chain has changed a little bit in patch. The tablesync.c
uses ReplicationSlotDropAtPubNode which is part of subscriptioncmds.h.
Now in our patch since tablesync.c includes subscriptioncmds.h and
thus slot.h need not to extern it for tablesync.c. And if we can get
rid of ReplicationSlotDropAtPubNode in slot.h, then walreceiver.h
inclusion can also be removed as that was needed for 'WalReceiverConn'
argument of ReplicationSlotDropAtPubNode. There could be other 'header
inclusions' involved as well but this seems the primary reason.

> --
> With Regards,
> Amit Kapila.

On Tuesday, November 21, 2023 1:39 PM Peter Smith <smithpb2250@gmail.com> wrote:

Hi,

Thanks for the comments.

> 
> ======
> doc/src/sgml/catalogs.sgml
> 
> 6.
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>subfailoverstate</structfield> <type>char</type>
> +      </para>
> +      <para>
> +       State codes for failover mode:
> +       <literal>d</literal> = disabled,
> +       <literal>p</literal> = pending enablement,
> +       <literal>e</literal> = enabled
> +      </para></entry>
> +     </row>
> +
> 
> This attribute is very similar to the 'subtwophasestate' so IMO it would be
> better to be adjacent to that one in the docs.
> 
> (probably this means putting it in the same order in the catalog also, assuming
> that is allowed)

It's allowed, but I think the functionality of two fields are different and I didn’t find
the correlation between two fields except for the type of value. So I didn't change the order.

> 
> ~~~
> 
> 12. AlterSubscription
> 
> + if (sub->failoverstate == LOGICALREP_FAILOVER_STATE_ENABLED &&
> + opts.copy_data) ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed
> when failover is enabled"),
> + errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION with refresh =
> false, or with copy_data = false, or use DROP/CREATE SUBSCRIPTION.")));
> 
> ~
> 
> 12b.
> AFAIK when there are messages like this that differ only by non-translatable
> things ("failover" option) then that non-translatable thing should be extracted
> as a parameter so the messages are common.
> And, don't forget to add a /* translator: %s is a subscription option like
> 'failover' */ comment.
> 
> SUGGESTION like:
> errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed
> when %s is enabled", "two_phase") errmsg("ALTER SUBSCRIPTION with refresh
> and copy_data is not allowed when %s is enabled", "failover")

I am not sure about changing the existing message here, I feel you can start a
separate thread to change the twophase related messages, and we can change accordingly
if it's accepted.

> 
> ======
> .../libpqwalreceiver/libpqwalreceiver.c
> 
> 15. libpqrcv_create_slot
> 
> + if (failover)
> + {
> + appendStringInfoString(&cmd, "FAILOVER"); if (use_new_options_syntax)
> + appendStringInfoString(&cmd, ", "); else appendStringInfoChar(&cmd, '
> + '); }
> 
> 15a.
> Isn't failover a new option that is unsupported pre-PG17? Why is it necessary to
> support an old-style syntax for something that was not supported on old
> servers? (I'm confused).
> 
> ~
> 
> 15b.
> Also IIRC, this FAILOVER wasn't not listed in the old-style syntax of
> doc/src/sgml/protocol.sgml. Was that deliberate?

We don't support FAILOVER for old-style syntax and pre-PG17,
libpqrcv_create_slot is only building the replication command string and we
will add failover in the string so that the publisher will report errors if it
doesn't support these options ,the same is true for two_phase.


> ~~~
> 
> 24. ReplicationSlotAlter
> +/*
> + * Change the definition of the slot identified by the passed in name.
> + */
> +void
> +ReplicationSlotAlter(const char *name, bool failover)
> 
> /the definition/the failover state/

I kept this as it's a general function but we only
support changing failover state for now.


> ~~~
> 
> 28. check_standby_slot_names
> 
> +bool
> +check_standby_slot_names(char **newval, void **extra, GucSource source)
> +{  if (strcmp(*newval, "") == 0)  return true;
> +
> + /*
> + * "*" is not accepted as in that case primary will not be able to know
> + * for which all standbys to wait for. Even if we have physical-slots
> + * info, there is no way to confirm whether there is any standby
> + * configured for the known physical slots.
> + */
> + if (strcmp(*newval, "*") == 0)
> + {
> + GUC_check_errdetail("\"%s\" is not accepted for standby_slot_names",
> + *newval); return false; }
> +
> + /* Now verify if the specified slots really exist and have correct
> + type */ if (!validate_standby_slots(newval)) return false;
> +
> + *extra = guc_strdup(ERROR, *newval);
> +
> + return true;
> +}
> 
> Is it really necessary to have a special test for the special value "*" which you are
> going to reject? I don't see why this should be any different from checking for
> other values like "." or "$" or "?" etc.
> Why not just let validate_standby_slots() handle all of these?

SplitIdentifierString() does not give error for '*' and '*' can be considered
as valid value which if accepted can mislead user that all the standbys's slots
are now considered, which is not the case here. So we want to explicitly call
out this case i.e. '*' is not accepted as valid value for standby_slot_names.

> 
> ~~~
> 
> 29. assign_standby_slot_names
> 
> + /* No value is specified for standby_slot_names. */ if
> + (standby_slot_names_cpy == NULL) return;
> 
> Is this possible? IIUC the check_standby_slot_names() did:
> *extra = guc_strdup(ERROR, *newval);
> 
> Maybe this code also needs a similar elog and comment like already in this
> function:
> /* This should not happen if GUC checked check_standby_slot_names. */

This case is possible, standby_slot_names_cpy(e.g. extra pointer) is NULL if no
value("") is specified for the GUC.(see the code in check_standby_slot_names).

> ~
> 
> 30. assign_standby_slot_names
> 
> + char    *standby_slot_names_cpy = extra;
> 
> IIUC, the 'extra' was unconditionally guc_strdup()'ed in the check hook, so
> should we also free it here before leaving this function?

No, as mentioned in src/backend/utils/misc/README, the space of extra
will be automatically freed when the associated GUC setting is no longer of interest.

> 
> ~~~
> 
> 31. GetStandbySlotList
> 
> +/*
> + * Return a copy of standby_slot_names_list if the copy flag is set to
> +true,
> + * otherwise return the original list.
> + */
> +List *
> +GetStandbySlotList(bool copy)
> +{
> + if (copy)
> + return list_copy(standby_slot_names_list);
> + else
> + return standby_slot_names_list;
> +}
> 
> Why is this better than just exposing the standby_slot_names_list. The caller
> can make a copy or not.
> e.g. why is calling GetStandbySlotList(true) better than just doing
> list_copy(standby_slot_names_list)?

I think either way is fine, but I prefer not to add one global
variable if possible.

> 
> ~~~
> 
> 34. WalSndFilterStandbySlots
> 
> + /* Log warning if no active_pid for this physical slot */ if
> + (slot->active_pid == 0) ereport(WARNING,
> 
> Other nearby code is guarding the slot in case it was NULL, so why not here? Is
> it a potential NPE?

I think it will not pass the check for restart_lsn before the active_pid if slot is NULL.

> 
> ~~~
> 
> 35.
> + /*
> + * If logical slot name is given in standby_slot_names, give WARNING
> + * and skip it. Since it is harmless, so WARNING should be enough, no
> + * need to error-out.
> + */
> + else if (SlotIsLogical(slot))
> + warningfmt = _("cannot have logical replication slot \"%s\" in
> parameter \"%s\", ignoring");
> 
> Is this possible? Doesn't the function 'validate_standby_slots' called by the GUC
> hook prevent specifying logical slots in the GUC? Maybe this warning should be
> changed to Assert?

I think user could drop the logical slot and recreate a physical slot with the same name
without changing the GUC.

> 
> ~~~
> 
> 36.
> + /*
> + * Reaching here indicates that either the slot has passed the
> + * wait_for_lsn or there is an issue with the slot that requires a
> + * warning to be reported.
> + */
> + if (warningfmt)
> + ereport(WARNING, errmsg(warningfmt, name, "standby_slot_names"));
> +
> + standby_slots_cpy = foreach_delete_current(standby_slots_cpy, lc);
> 
> If something was wrong with the slot that required a warning, is it really OK to
> remove this slot from the list? This seems contrary to the function comment
> which only talks about removing slots that have caught up.

I think it's OK to remove slots if it's invalidated, dropped, or was
changed to logical one as we don't need to wait for these slots to catch up anymore.

> ~~~
> 
> 41.
>   /*
> - * Fast path to avoid acquiring the spinlock in case we already know we
> - * have enough WAL available. This is particularly interesting if we're
> - * far behind.
> + * Check if all the standby servers have confirmed receipt of WAL upto
> + * RecentFlushPtr if we already know we have enough WAL available.
> + *
> + * Note that we cannot directly return without checking the status of
> + * standby servers because the standby_slot_names may have changed,
> + which
> + * means there could be new standby slots in the list that have not yet
> + * caught up to the RecentFlushPtr.
>   */
>   if (RecentFlushPtr != InvalidXLogRecPtr &&
>   loc <= RecentFlushPtr)
> - return RecentFlushPtr;
> + {
> + WalSndFilterStandbySlots(RecentFlushPtr, &standby_slots);
> 
> 41b.
> IMO there is some missing information in this comment because it wasn't clear
> to me that calling WalSndFilterStandbySlots was going to side-efect that list to
> give it a different meaning. e.g. it seems it no longer means "standby slots" but
> instead means something like "standby slots that are not caught up". Perhaps
> that local variable can have a name that helps to convey that better?

I am not sure about this, WalSndFilterStandbySlots already indicates it will
filter the slot list which seems clear to me. But if you have better ideas, we can
adjust in next version.

> 
> ~~~
> 
> 44.
> + if (wait_for_standby)
> + ConditionVariablePrepareToSleep(&WalSndCtl->wal_confirm_rcv_cv);
> + else if (MyWalSnd->kind == REPLICATION_KIND_PHYSICAL)
>   ConditionVariablePrepareToSleep(&WalSndCtl->wal_flush_cv);
>   else if (MyWalSnd->kind == REPLICATION_KIND_LOGICAL)
>   ConditionVariablePrepareToSleep(&WalSndCtl->wal_replay_cv);
> ~
> 
> A walsender is either physical or logical, but here the 'wait_for_standby' flag
> overrides everything. Is it OK for this to be if/else/else or should this code call
> for wal_confirm_rcv_cv AND the other one?

No, we cannot prepare to sleep twice(see the comment in
ConditionVariablePrepareToSleep()).

> ======
> src/include/catalog/pg_subscription.h
> 
> 54.
> /*
>  * two_phase tri-state values. See comments atop worker.c to know more
> about
>  * these states.
>  */
> #define LOGICALREP_TWOPHASE_STATE_DISABLED 'd'
> #define LOGICALREP_TWOPHASE_STATE_PENDING 'p'
> #define LOGICALREP_TWOPHASE_STATE_ENABLED 'e'
> 
> #define LOGICALREP_FAILOVER_STATE_DISABLED 'd'
> #define LOGICALREP_FAILOVER_STATE_PENDING 'p'
> #define LOGICALREP_FAILOVER_STATE_ENABLED 'e'
> 
> ~
> 
> 54a.
> There should either be another comment (like the 'two_phase tri-state'
> one) added for the FAILOVER states or that existing comment should be
> expanded so that it also mentions the 'failover' tri-states.
> 
> ~
> 
> 54b.
> Idea: If you are willing to change the constant names (not the values) of the
> current tri-states then now both the 'two_phase' and 'failover'
> could share them -- I also think this might give the ability to create macros (if
> wanted) or to share more code instead of always handling failover and
> two_phase separately.
> 
> SUGGESTION
> #define LOGICALREP_TRISTATE_DISABLED 'd'
> #define LOGICALREP_TRISTATE_PENDING 'p'
> #define LOGICALREP_TRISTATE_ENABLED 'e'

I am not sure about the idea, but if others also prefer this then
we can adjust the code.


~~~ 
On Wednesday, November 22, 2023 3:42 PM Peter Smith <smithpb2250@gmail.com> wrote:
> 6.
> +# The subscription that's up and running and is enabled for failover #
> +doesn't get the data from primary and keeps waiting for the # standby
> +specified in standby_slot_names.
> +$result = $subscriber1->safe_psql('postgres',
> + "SELECT count(*) = 0 FROM tab_int;");
> +is($result, 't', "subscriber1 doesn't get data from primary until
> standby1 acknowledges changes");
> 
> Might it be better to write as "SELECT count(*) = $primary_row_count FROM
> tab_int;" and expect it to return false?

Ensuring the number is 0 looks better to me.

Attach the V38 patch set which addressed all comments in [1][2]
except for the ones that mentioned above.

[1] https://www.postgresql.org/message-id/CAHut%2BPv-yu71ogj_hRi6cCtmD55bsyw7XTxj1Nq8yVFKpY3NDQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPuEGX5kr0xh06yv8ndoAQvDNedoec1OqOq3GMxDN6p%3D9A%40mail.gmail.com

Best Regards,
Hou zj

Attachment

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

27 November 2023, 06:02:00

On Monday, November 27, 2023 12:03 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> Attach the V38 patch set which addressed all comments in [1][2] except for the
> ones that mentioned above.
> 
> [1]
> https://www.postgresql.org/message-id/CAHut%2BPv-yu71ogj_hRi6cCtmD5
> 5bsyw7XTxj1Nq8yVFKpY3NDQ%40mail.gmail.com
> [2]
> https://www.postgresql.org/message-id/CAHut%2BPuEGX5kr0xh06yv8ndoA
> QvDNedoec1OqOq3GMxDN6p%3D9A%40mail.gmail.com

I didn't increment the patch version, sorry for that. Attach the same patch set
but increment the patch version to V39.

Best Regards,
Hou zj

On Monday, November 27, 2023 4:51 PM shveta malik <shveta.malik@gmail.com> wrote:
> 
> On Mon, Nov 27, 2023 at 2:15 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On 11/27/23 7:02 AM, Zhijie Hou (Fujitsu) wrote:
> > > On Monday, November 27, 2023 12:03 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> > >>
> > >> Attach the V38 patch set which addressed all comments in [1][2]
> > >> except for the ones that mentioned above.
> > >>
> > >> [1]
> > >>
> https://www.postgresql.org/message-id/CAHut%2BPv-yu71ogj_hRi6cCtmD5
> > >> 5bsyw7XTxj1Nq8yVFKpY3NDQ%40mail.gmail.com
> > >> [2]
> > >>
> https://www.postgresql.org/message-id/CAHut%2BPuEGX5kr0xh06yv8ndoA
> > >> QvDNedoec1OqOq3GMxDN6p%3D9A%40mail.gmail.com
> > >
> > > I didn't increment the patch version, sorry for that. Attach the
> > > same patch set but increment the patch version to V39.
> >
> > Thanks!
> >
> > It looks like v39 does not contain (some / all?) the changes that have
> > been done in v38 [1].
> >
> > For example, slot_exists_in_list() still exists in v39 while it was
> > renamed to
> > validate_sync_slot() in v38.
> >
> 
> Yes, I noticed that and informed Hou-san about this. New patches will be
> posted soon with the correction. Meanwhile, please review v38 instead if you
> intend to review patch002 right now.  v39 is supposed to have changes in
> patch001 alone.

Here is the updated version(v39_2) which include all the changes made in 0002.
Please use for review, and sorry for the confusion.

Best Regards,
Hou zj

On Tue, Nov 28, 2023 at 12:19 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On 11/28/23 4:13 AM, shveta malik wrote:
> > On Mon, Nov 27, 2023 at 4:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>
> >> On Mon, Nov 27, 2023 at 2:27 PM Zhijie Hou (Fujitsu)
> >> <houzj.fnst@fujitsu.com> wrote:
> >>>
> >>> Here is the updated version(v39_2) which include all the changes made in 0002.
> >>> Please use for review, and sorry for the confusion.
> >>>
> >>
> >> --- a/src/backend/replication/logical/launcher.c
> >> +++ b/src/backend/replication/logical/launcher.c
> >> @@ -8,20 +8,27 @@
> >>    *   src/backend/replication/logical/launcher.c
> >>    *
> >>    * NOTES
> >> - *   This module contains the logical replication worker launcher which
> >> - *   uses the background worker infrastructure to start the logical
> >> - *   replication workers for every enabled subscription.
> >> + *   This module contains the replication worker launcher which
> >> + *   uses the background worker infrastructure to:
> >> + *   a) start the logical replication workers for every enabled subscription
> >> + *      when not in standby_mode.
> >> + *   b) start the slot sync worker for logical failover slots synchronization
> >> + *      from the primary server when in standby_mode.
> >>
> >> I was wondering do we really need a launcher on standby to invoke
> >> sync-slot worker. If so, why? I guess it may be required for previous
> >> versions where we were managing work for multiple slot-sync workers
> >> which is also questionable in the sense of whether launcher is the
> >> right candidate for the same but now with the single slot-sync worker,
> >> it doesn't seem worth having it. What do you think?
> >>
> >> --
> >
> > Yes, earlier a manager process was needed to manage multiple slot-sync
> > workers and distribute load among them, but now that does not seem
> > necessary. I gave it a try (PoC) and it seems to work well.  If  there
> > are no objections to this approach, I can share the patch soon.
> >
>
> +1 on this new approach, thanks!

PFA v40. This patch has removed Logical Replication Launcher support
to launch slotsync worker.  The slot-sync worker is now registered as
bgworker with postmaster, with
bgw_start_time=BgWorkerStart_ConsistentState and
bgw_restart_time=60sec.

On removal of launcher, now all the validity checks have been shifted
to slot-sync worker itself.  This brings us to some point of concerns:

a) We still need to maintain  RecoveryInProgress() check in slotsync
worker. Since worker has the start time of
BgWorkerStart_ConsistentState, it will be started on non-standby as
well. So to ensure that it exists on non-standby, "RecoveryInProgress"
has been introduced at the beginning of the worker. But once it exits,
postmaster will not restart it since it will be clean-exist i.e.
proc_exit(0) (the restart logic of postmaster comes into play only
when there is an abnormal exit). But to exit for the first time on
non-standby, we need that Recovery related check in worker.

b) "enable_syncslot" check is moved to slotsync worker now. Since
enable_syncslot is PGC_SIGHUP, so proc_exit(1) is currently used to
exit the worker if 'enable_syncslot' is found to be disabled.
'proc_exit(1)' has been used in order to ensure that the worker is
restarted and GUCs are checked again after restart_time. Downside of
this approach is, if someone has kept "enable_syncslot" as disabled
permanently even on standby, slotsync worker will keep on restarting
and exiting.

So to overcome the above pain-points, I think a potential approach
will be to start slotsync worker only if 'enable_syncslot' is on and
the system is non-standby. Potential ways (each with some issues) are:

1) Use the current way i.e. register slot-sync worker as bgworker with
postmaster, but introduce extra checks in 'maybe_start_bgworkers'. But
this seems more like a hack. This will need extra changes as currently
once 'maybe_start_bgworkers' is attempted by postmaster, it will
attempt again to start any worker only if the worker had abnormal exit
and restart_time !=0. The current postmatser will not attempt to start
worker on any GUC change.

2) Another way maybe to treat slotsync worker as special case and
separate out the start/restart of slotsync worker from bgworker, and
follow what we do for autovacuum launcher(StartAutoVacLauncher) to
keep starting it in the postmaster loop(ServerLoop). In this way, we
may be able to add more checks before starting worker. But by opting
this approach, we will have to manage slotsync worker completely by
ourself as it will be no longer be part of existing
bgworker-registration infra. If this seems okay and there are no other
better options, it can be analyzed further in detail.

3) Another approach could be, in order to solve issue (a), introduce a
new start_time 'BgWorkerStart_ConsistentState_HotStandby' which means
start a bgworker only if consistent state is reached and the system is
standby. And for issue (b), lets retain check of enable_syncslot in
the worker itself but make it 'PGC_POSTMASTER'. This will ensure we
can safely exit the worker(proc_exit(0) if enable_syncslot is disabled
and postmaster will not restart it. But I'm not sure if making it
"PGC_POSTMASTER" is acceptable from the user's perspective.

thanks
Shveta

On Tuesday, November 28, 2023 11:58 PM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote:
> 
> Hi,
> 
> On 11/28/23 10:40 AM, shveta malik wrote:
> > On Tue, Nov 28, 2023 at 12:19 PM Drouvot, Bertrand
> > <bertranddrouvot.pg@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> On 11/28/23 4:13 AM, shveta malik wrote:
> >>> On Mon, Nov 27, 2023 at 4:08 PM Amit Kapila
> <amit.kapila16@gmail.com> wrote:
> >>>>
> >>>> On Mon, Nov 27, 2023 at 2:27 PM Zhijie Hou (Fujitsu)
> >>>> <houzj.fnst@fujitsu.com> wrote:
> >>>>>
> >>>>> Here is the updated version(v39_2) which include all the changes made
> in 0002.
> >>>>> Please use for review, and sorry for the confusion.
> >>>>>
> >>>>
> >>>> --- a/src/backend/replication/logical/launcher.c
> >>>> +++ b/src/backend/replication/logical/launcher.c
> >>>> @@ -8,20 +8,27 @@
> >>>>     *   src/backend/replication/logical/launcher.c
> >>>>     *
> >>>>     * NOTES
> >>>> - *   This module contains the logical replication worker launcher which
> >>>> - *   uses the background worker infrastructure to start the logical
> >>>> - *   replication workers for every enabled subscription.
> >>>> + *   This module contains the replication worker launcher which
> >>>> + *   uses the background worker infrastructure to:
> >>>> + *   a) start the logical replication workers for every enabled
> subscription
> >>>> + *      when not in standby_mode.
> >>>> + *   b) start the slot sync worker for logical failover slots
> synchronization
> >>>> + *      from the primary server when in standby_mode.
> >>>>
> >>>> I was wondering do we really need a launcher on standby to invoke
> >>>> sync-slot worker. If so, why? I guess it may be required for
> >>>> previous versions where we were managing work for multiple
> >>>> slot-sync workers which is also questionable in the sense of
> >>>> whether launcher is the right candidate for the same but now with
> >>>> the single slot-sync worker, it doesn't seem worth having it. What do you
> think?
> >>>>
> >>>> --
> >>>
> >>> Yes, earlier a manager process was needed to manage multiple
> >>> slot-sync workers and distribute load among them, but now that does
> >>> not seem necessary. I gave it a try (PoC) and it seems to work well.
> >>> If  there are no objections to this approach, I can share the patch soon.
> >>>
> >>
> >> +1 on this new approach, thanks!
> >
> > PFA v40. This patch has removed Logical Replication Launcher support
> > to launch slotsync worker.
> 
> Thanks!
> 
> >  The slot-sync worker is now registered as bgworker with postmaster,
> > with bgw_start_time=BgWorkerStart_ConsistentState and
> > bgw_restart_time=60sec.
> >
> > On removal of launcher, now all the validity checks have been shifted
> > to slot-sync worker itself.  This brings us to some point of concerns:
> >
> > a) We still need to maintain  RecoveryInProgress() check in slotsync
> > worker. Since worker has the start time of
> > BgWorkerStart_ConsistentState, it will be started on non-standby as
> > well. So to ensure that it exists on non-standby, "RecoveryInProgress"
> > has been introduced at the beginning of the worker. But once it exits,
> > postmaster will not restart it since it will be clean-exist i.e.
> > proc_exit(0) (the restart logic of postmaster comes into play only
> > when there is an abnormal exit). But to exit for the first time on
> > non-standby, we need that Recovery related check in worker.
> >
> > b) "enable_syncslot" check is moved to slotsync worker now. Since
> > enable_syncslot is PGC_SIGHUP, so proc_exit(1) is currently used to
> > exit the worker if 'enable_syncslot' is found to be disabled.
> > 'proc_exit(1)' has been used in order to ensure that the worker is
> > restarted and GUCs are checked again after restart_time. Downside of
> > this approach is, if someone has kept "enable_syncslot" as disabled
> > permanently even on standby, slotsync worker will keep on restarting
> > and exiting.
> >
> > So to overcome the above pain-points, I think a potential approach
> > will be to start slotsync worker only if 'enable_syncslot' is on and
> > the system is non-standby.
> 
> That makes sense to me.
> 
> > Potential ways (each with some issues) are:
> >
> > 1) Use the current way i.e. register slot-sync worker as bgworker with
> > postmaster, but introduce extra checks in 'maybe_start_bgworkers'. But
> > this seems more like a hack. This will need extra changes as currently
> > once 'maybe_start_bgworkers' is attempted by postmaster, it will
> > attempt again to start any worker only if the worker had abnormal exit
> > and restart_time !=0. The current postmatser will not attempt to start
> > worker on any GUC change.
> >
> > 2) Another way maybe to treat slotsync worker as special case and
> > separate out the start/restart of slotsync worker from bgworker, and
> > follow what we do for autovacuum launcher(StartAutoVacLauncher) to
> > keep starting it in the postmaster loop(ServerLoop). In this way, we
> > may be able to add more checks before starting worker. But by opting
> > this approach, we will have to manage slotsync worker completely by
> > ourself as it will be no longer be part of existing
> > bgworker-registration infra. If this seems okay and there are no other
> > better options, it can be analyzed further in detail.
> >
> > 3) Another approach could be, in order to solve issue (a), introduce a
> > new start_time 'BgWorkerStart_ConsistentState_HotStandby' which means
> > start a bgworker only if consistent state is reached and the system is
> > standby. And for issue (b), lets retain check of enable_syncslot in
> > the worker itself but make it 'PGC_POSTMASTER'. This will ensure we
> > can safely exit the worker(proc_exit(0) if enable_syncslot is disabled
> > and postmaster will not restart it. But I'm not sure if making it
> > "PGC_POSTMASTER" is acceptable from the user's perspective.
> 
> I had the same idea (means make enable_syncslot as 'PGC_POSTMASTER')
> when reading b). I'm +1 on it (at least for V1) as I don't think that this parameter
> value would change frequently. Curious to know what others think too.
> 
> Then as far a) is concerned, I'd vote for introducing a new
> BgWorkerStart_ConsistentState_HotStandby.

Here is the V41 patch set which includes the following changes.

V41-0001:
1) Based on the discussion[1], I update the document to remind user to
change the slot's failover option when ALTER SUBSCRIPTION ... SET
(slot_name = xx).

2) Address comments in [2][3][4].

V41-0002:
1) 'enable_syncslot' is changed from PGC_SIGHUP to PGC_POSTMASTER,
slot-sync worker will now clean exit (proc_exit(0)) if enable_syncslot is
found disabled.

2) BgWorkerStart_ConsistentState_HotStandby is introduced as new
start-time for bgworker. This will start worker only if it is standby_mode
and consistent state is reached.

3) 'SYNCSLOT_STATE_INITIATED' is now set in 'ReplicationSlotCreate' itself
in slot-sync worker case. Earlier it was set at later point of time giving
a window wherein even a synced slot was in 'n' state for quite some time,
which was not correct.

Thanks Shveta for working on the V41-0002.

[1] https://www.postgresql.org/message-id/CAA4eK1Jd9dk%3D5POTKM9p4EyYqYzLXe-AnLzHrUELjzZScLz7mw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/eb09f682-db82-41cd-93bc-5d44e10e1d6d%40gmail.com
[3] https://www.postgresql.org/message-id/CAHut%2BPsuSWjm7U_sVnL8FXZ7ZQcfCcT44kAK7i6qMG35Cwjy3A%40mail.gmail.com
[4] https://www.postgresql.org/message-id/CAFPTHDbFqLgXS6Et%2BshNGPDjCKK66C%2BZSarqFHmQvfnAah3Qsw%40mail.gmail.com

Best Regards,
Hou zj

On Fri, Dec 1, 2023 at 12:47 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Fri, Dec 1, 2023 at 11:17 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Friday, December 1, 2023 12:51 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > Hi,
> >
> > >
> > > On Fri, Dec 1, 2023 at 9:40 AM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > On Wednesday, November 29, 2023 5:12 PM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > I was reviewing slotsync worker design and here
> > > > are few comments on 0002 patch:
> > >
> > > Thanks for reviewing the patch.
> > >
> > > >
> > > >
> > > > 3. In synchronize_one_slot, do we need to skip the slot sync and drop if the
> > > > local slot is a physical one ?
> > > >
> > >
> > > IMO, if a local slot exists which is a physical one, it will be a user
> > > created slot and in that case worker will error out on finding
> > > existing slot with same name. And the case where local slot is
> > > physical one but not user-created is not possible on standby (assuming
> > > we have correct check on primary disallowing setting 'failover'
> > > property for physical slot). Do you have some other scenario in mind,
> > > which I am missing here?
> >
> > I was thinking about the race condition when it has confirmed that the slot is
> > not a user created one and enter "sync_state == SYNCSLOT_STATE_READY" branch,
> > but at this moment, if someone uses "DROP_REPLICATION_SLOT" to drop this slot and
> > recreate another one(e.g. a physical one), then the slotsync worker will
> > overwrite the fields of this physical slot. Although this affects user created
> > logical slots in similar cases as well.
> >
>
> User can not  drop the synced slots on standby. It should result in
> ERROR. Currently we emit this error in pg_drop_replication_slot(),
> same is needed in  "DROP_REPLICATION_SLOT" replication cmd. I will
> change it. Thanks for raising this point.  I think, after this ERROR,
> there is no need to worry about physical slots handling in
> synchronize_one_slot().
>
> > And the same is true for slotsync_drop_initiated_slots() and
> > drop_obsolete_slots(), as we don't lock the slots in the list, if user tri to
> > drop and re-create old slot concurrently, then we could drop user created slot
> > here.
> >


PFA v42. Changes:

v42-0001: addressed comments in [1]. Thanks Hou-San for working on this.

v42-0002: addressed comments in [2] and [3]

[1]: https://www.postgresql.org/message-id/CAHut%2BPsMTvrwUBtcHff0CG_j-ALSuEta8xC1R_k0kjR%2B9A6ehg%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAFPTHDb8LW4i9-nyvz%2BXVkJmmciZwYGivpH%3DaDOrDkBfHR_q9w%40mail.gmail.com
[3]:
https://www.postgresql.org/message-id/OS0PR01MB571678BABEDBE830062CAB119481A%40OS0PR01MB5716.jpnprd01.prod.outlook.com

thanks
Shveta

On Wed, Dec 6, 2023 at 4:28 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Dec 6, 2023 at 3:00 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On 12/6/23 7:18 AM, shveta malik wrote:
> > > On Wed, Dec 6, 2023 at 10:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >>
> > >> I feel that is indirectly relying on the fact that the primary won't
> > >> advance logical slots unless physical standby has consumed data.
> > >
> > > Yes, that is the basis of this discussion.
> >
> > Yes.
> >
> > > But now on rethinking, if
> > > the user has not set 'standby_slot_names' on primary at first pace,
> > > then even if walreceiver on standby is down, slots on primary will
> > > keep on advancing
> >
> > Oh right, good point.
> >
> > > and thus we need to sync.
> >
> > Yes and I think our current check "XLogRecPtrIsInvalid(WalRcv->latestWalEnd)"
> > in synchronize_slots() prevents us to do so (as I think WalRcv->latestWalEnd
> > would be invalid for a non started walreceiver).
> >
>
> But I think we do not need to deal with the case that walreceiver is
> not started at all on standby. It is always started. Walreceiver not
> getting started or down for long is a rare scenario. We have other
> checks too for 'latestWalEnd' in slotsync worker and I think we should
> retain those as is.
>
> > > We have no check currently
> > > that mandates users to set standby_slot_names.
> > >
> >
> > Yeah and OTOH unset standby_slot_names is currently the only
> > way for users to "force" advance failover slots if they want to (in case
> > say the standby is down for a long time and they don't want to block logical decoding
> > on the primary) as we don't provide a way to alter the failover property
> > (unless connecting with replication which sounds more like a hack).
> >
>
> yes, right.
>
> > >> Now,
> > >> it is possible that slot-sync worker lags behind and still needs to
> > >> sync more data for slots in which it makes sense for slot-sync worker
> > >> to be alive.
> >
> > Right.
> >
> > Regards,
> >
> > --
> > Bertrand Drouvot
> > PostgreSQL Contributors Team
> > RDS Open Source Databases
> > Amazon Web Services: https://aws.amazon.com


PFA v43, changes are:

v43-001:
1) Support of  'failover' dump in pg_dump. It was missing earlier.

v43-002:
1) Slot-sync worker now  checks validity of primary_slot_name by
connecting to primary, once during its start and later if
primary_slot_name GUC is changed.
2) Doc improvement (see logicaldecoding.sgml). More details on overall
slot-sync feature is added along with Nisha's comment of documenting
disabled-subscription behaviour wrt to synced slots.

thanks
Shveta

On Wed, Dec 6, 2023 at 4:53 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> v43-001:
> 1) Support of  'failover' dump in pg_dump. It was missing earlier.
>

Review v43-0001
================
1.
+ * However, we do not enable failover for slots created by the table sync
+ * worker. This is because the table sync slot might not be fully synced on the
+ * standby.

The reason for not enabling failover for table sync slots is not
clearly mentioned.

2.
During syncing, the local restart_lsn and/or local catalog_xmin of
+ * the newly created slot on the standby are typically ahead of those on the
+ * primary. Therefore, the standby needs to wait for the primary server's
+ * restart_lsn and catalog_xmin to catch up, which takes time.

I think this part of the comment should be moved to 0002 patch. We can
probably describe a bit more about why slot on standby will be ahead
and about waiting time.

3.
validate_standby_slots()
{
...
+ slot = SearchNamedReplicationSlot(name, true);
+
+ if (!slot)
+ goto ret_standby_slot_names_ng;
+
+ if (!SlotIsPhysical(slot))
+ {
+ GUC_check_errdetail("\"%s\" is not a physical replication slot",
+ name);
+ goto ret_standby_slot_names_ng;
+ }

Why the first check (slot not found) doesn't have errdetail? The
goto's in this function look a bit odd, can we try to avoid those?

4.
+ /* Verify syntax and parse string into list of identifiers */
+ if (!SplitIdentifierString(rawname, ',', &elemlist))
+ {
+ /* syntax error in name list */
+ GUC_check_errdetail("List syntax is invalid.");
...
...
+ if (!SplitIdentifierString(standby_slot_names_cpy, ',', &standby_slots))
+ {
+ /* This should not happen if GUC checked check_standby_slot_names. */
+ elog(ERROR, "invalid list syntax");

Both are checking the same string but giving different error messages.
I think the error message should be the same in both cases. The first
one seems better.

5. In WalSndFilterStandbySlots(), the comments around else if checks
should move inside the checks. It is hard to read the code in the
current format. I have tried to change the same in the attached.

Apart from the above, I have changed the comments and made some minor
cosmetic changes in the attached. Kindly include in next version if
you are fine with it.

--
With Regards,
Amit Kapila.

Attachment

v43_changes_amit_1.patch.txt

Re: Synchronizing slots from primary to standby

From

Peter Smith

Date:

07 December 2023, 22:58:52

Hi.

Here is another review comment for the patch v43-0001.

======
src/bin/pg_dump/pg_dump.c

1. getSubscriptions

+ if (fout->remoteVersion >= 170000)
+ appendPQExpBufferStr(query,
+ " subfailoverstate\n");
+ else
+ appendPQExpBuffer(query,
+   " '%c' AS subfailoverstate\n",
+   LOGICALREP_FAILOVER_STATE_DISABLED);
+

That first appended string should include the table alias same as all
the nearby code does.

e.g. " subfailoverstate\n" should be " s.subfailoverstate\n"

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

08 December 2023, 03:03:05

On Thu, Dec 7, 2023 at 2:57 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On 12/7/23 10:07 AM, shveta malik wrote:
> > On Thu, Dec 7, 2023 at 1:19 PM Drouvot, Bertrand
> > <bertranddrouvot.pg@gmail.com> wrote:
> >> Might be worth to add comments in the code (around the WalRcv->latestWalEnd
> >> checks) that no "lagging" sync are possible if the walreceiver is not started
> >> though?
> >>
> >
> > I am a bit confused. Do you mean as a TODO item? Otherwise the comment
> > will be opposite of the code we are writing.
>
> Sorry for the confusion: what I meant to say is that
> synchronization (should it be lagging) is not possible if the walreceiver is not started
> (as XLogRecPtrIsInvalid(WalRcv->latestWalEnd) would be true).
>

Sure, I will add it. Thanks for the clarification.

> More precisely here (in synchronize_slots()):
>
>      /* The primary_slot_name is not set yet or WALs not received yet */
>      SpinLockAcquire(&WalRcv->mutex);
>      if (!WalRcv ||
>          (WalRcv->slotname[0] == '\0') ||
>          XLogRecPtrIsInvalid(WalRcv->latestWalEnd))
>      {
>          SpinLockRelease(&WalRcv->mutex);
>          return naptime;
>      }
>
> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

08 December 2023, 09:06:19

On Wed, Dec 6, 2023 at 4:53 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> PFA v43, changes are:
>

I wanted to discuss 0003 patch about cascading standby's. It is not
clear to me whether we want to allow physical standbys to further wait
for cascading standby to sync their slots. If we allow such a feature
one may expect even primary to wait for all the cascading standby's
because otherwise still logical subscriber can be ahead of one of the
cascading standby. I feel even if we want to allow such a behaviour we
can do it later once the main feature is committed. I think it would
be good to just allow logical walsenders on primary to wait for
physical standbys represented by GUC 'standby_slot_names'. If we agree
on that then it would be good to prohibit setting this GUC on standby
or at least it should be a no-op even if this GUC should be set on
physical standby.

Thoughts?

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

08 December 2023, 12:05:50

On Thursday, December 7, 2023 7:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Wed, Dec 6, 2023 at 4:53 PM shveta malik <shveta.malik@gmail.com>
> wrote:
> >
> > v43-001:
> > 1) Support of  'failover' dump in pg_dump. It was missing earlier.
> >
> 
> Review v43-0001
> ================
> 1.
> + * However, we do not enable failover for slots created by the table
> + sync
> + * worker. This is because the table sync slot might not be fully
> + synced on the
> + * standby.
> 
> The reason for not enabling failover for table sync slots is not clearly
> mentioned.
> 
> 2.
> During syncing, the local restart_lsn and/or local catalog_xmin of
> + * the newly created slot on the standby are typically ahead of those
> + on the
> + * primary. Therefore, the standby needs to wait for the primary
> + server's
> + * restart_lsn and catalog_xmin to catch up, which takes time.
> 
> I think this part of the comment should be moved to 0002 patch. We can
> probably describe a bit more about why slot on standby will be ahead and
> about waiting time.
> 
> 3.
> validate_standby_slots()
> {
> ...
> + slot = SearchNamedReplicationSlot(name, true);
> +
> + if (!slot)
> + goto ret_standby_slot_names_ng;
> +
> + if (!SlotIsPhysical(slot))
> + {
> + GUC_check_errdetail("\"%s\" is not a physical replication slot",
> + name); goto ret_standby_slot_names_ng; }
> 
> Why the first check (slot not found) doesn't have errdetail? The goto's in this
> function look a bit odd, can we try to avoid those?
> 
> 4.
> + /* Verify syntax and parse string into list of identifiers */ if
> + (!SplitIdentifierString(rawname, ',', &elemlist)) {
> + /* syntax error in name list */
> + GUC_check_errdetail("List syntax is invalid.");
> ...
> ...
> + if (!SplitIdentifierString(standby_slot_names_cpy, ',',
> + &standby_slots)) {
> + /* This should not happen if GUC checked check_standby_slot_names. */
> + elog(ERROR, "invalid list syntax");
> 
> Both are checking the same string but giving different error messages.
> I think the error message should be the same in both cases. The first one seems
> better.
> 
> 5. In WalSndFilterStandbySlots(), the comments around else if checks should
> move inside the checks. It is hard to read the code in the current format. I have
> tried to change the same in the attached.
> 
> Apart from the above, I have changed the comments and made some minor
> cosmetic changes in the attached. Kindly include in next version if you are fine
> with it.

Thanks for the comments and changes, I have addressed them.
Here is the V44 patch set which addressed comments above and [1].

The new version patches also include the follow changes:

V44-0001
* Let the pg_replication_slot_advance also wait for the slots specified
in standby_slot_names to catch up.
* added few test cases to cover the wait/wakeup logic in
walsender related to standby_slot_names.
* ran pgindent.

V44-0002
* added few comments to explain the case when the slot is valid on primary
while is invalidated on standby.

Thanks Ajin for analyzing and making the tests.

The pending comments on 0002 will be addressed in next version.

[1] https://www.postgresql.org/message-id/CAHut%2BPvRD5V-zzTvffDdcnqB1T4JNATKGgw%2BwdQCKAgeCYr0xQ%40mail.gmail.com

Best Regards,
Hou zj

On Monday, December 11, 2023 8:17 AM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> FYI -- the patch 0002 did not apply cleanly for me on top of the 050 test file
> created by patch 0001.
> 
> [postgres@CentOS7-x64 oss_postgres_misc]$ git
> apply ../patches_misc/v44-0001-Allow-logical-walsenders-to-wait-for-the-ph
> ysica.patch
> 
> [postgres@CentOS7-x64 oss_postgres_misc]$ git
> apply ../patches_misc/v44-0002-Add-logical-slot-sync-capability-to-the-phy
> sical.patch
> error: patch failed: src/test/recovery/t/050_standby_failover_slots_sync.pl:289
> error: src/test/recovery/t/050_standby_failover_slots_sync.pl: patch does not
> apply

Thanks for reporting. Here is the rebased patch set V44_2.
(There are no code changes in this version.)

Best Regards,
Hou zj

On Mon, Dec 11, 2023 at 2:20 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Dec 11, 2023 at 1:47 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Dec 8, 2023 at 2:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Dec 6, 2023 at 4:53 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > PFA v43, changes are:
> > > >
> > >
> > > I wanted to discuss 0003 patch about cascading standby's. It is not
> > > clear to me whether we want to allow physical standbys to further wait
> > > for cascading standby to sync their slots. If we allow such a feature
> > > one may expect even primary to wait for all the cascading standby's
> > > because otherwise still logical subscriber can be ahead of one of the
> > > cascading standby. I feel even if we want to allow such a behaviour we
> > > can do it later once the main feature is committed. I think it would
> > > be good to just allow logical walsenders on primary to wait for
> > > physical standbys represented by GUC 'standby_slot_names'. If we agree
> > > on that then it would be good to prohibit setting this GUC on standby
> > > or at least it should be a no-op even if this GUC should be set on
> > > physical standby.
> > >
> > > Thoughts?
> >
> > IMHO, why not keep the behavior consistent across primary and standby?
> >  I mean if it doesn't require a lot of new code/design addition then
> > it should be the user's responsibility.  I mean if the user has set
> > 'standby_slot_names' on standby then let standby also wait for
> > cascading standby to sync their slots?  Is there any issue with that
> > behavior?
> >
>
> Without waiting for cascading standby on primary, it won't be helpful
> to just wait on standby.
>
> Currently logical walsenders on primary waits for physical standbys to
> take changes before they update their own logical slots. But they wait
> only for their immediate standbys and not for cascading standbys.
> Although, on first standby, we do have logic where slot-sync workers
> wait for cascading standbys before they update their own slots (synced
> ones, see patch3). But this does not guarantee that logical
> subscribers on primary will never be ahead of the cascading standbys.
> Let us consider this timeline:
>
> t1: logical walsender on primary waiting for standby1 (first standby).
> t2: physical walsender on standby1 is stuck and thus there is delay in
> sending these changes to standby2 (cascading standby).
> t3: standby1 has taken changes and sends confirmation to primary.
> t4: logical walsender on primary receives confirmation from standby1
> and updates slot, logical subscribers of primary also receives the
> changes.
> t5: standby2 has not received changes yet as physical walsender on
> standby1 is still stuck, slotsync worker still waiting for standby2
> (cascading) before it updates its own slots (synced ones).
> t6: standby2 is promoted to become primary.
>
> Now we are in a state wherein primary, logical subscriber and first
> standby has some changes but cascading standby does not. And logical
> slots on primary were updated w/o confirming if cascading standby has
> taken changes or not. This is a problem and we do not have a simple
> solution for this yet.
>
> thanks
> Shveta


PFA v45, changes in patch002:

--Addressed comments in [1] and [2]
--Added holistic test case for patch02. Thanks Nisha for the test
implementation.

[1]: https://www.postgresql.org/message-id/CAHut%2BPuuqEpDse5msENsVuK3rjTRN-QGS67rRCGVv%2BzcT-f0GA%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAA4eK1KbhdjKqui%3Dfr4Ny2TwGAFU9WLWTdypN%2BWG0WEfnBR%3D4w%40mail.gmail.com

thanks
Shveta

On Monday, December 11, 2023 3:52 PM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote:

Hi,

> On 12/8/23 10:06 AM, Amit Kapila wrote:
> > On Wed, Dec 6, 2023 at 4:53 PM shveta malik <shveta.malik@gmail.com>
> wrote:
> >>
> >> PFA v43, changes are:
> >>
> >
> > I wanted to discuss 0003 patch about cascading standby's. It is not
> > clear to me whether we want to allow physical standbys to further wait
> > for cascading standby to sync their slots. If we allow such a feature
> > one may expect even primary to wait for all the cascading standby's
> > because otherwise still logical subscriber can be ahead of one of the
> > cascading standby.
> 
> I've the same feeling here. I think it would probably be expected that the
> primary also wait for all the cascading standby.
> 
> > I feel even if we want to allow such a behaviour we can do it later
> > once the main feature is committed.
> 
> Agree.
> 
> > I think it would
> > be good to just allow logical walsenders on primary to wait for
> > physical standbys represented by GUC 'standby_slot_names'.
> 
> That makes sense for me for v1.
> 
> > If we agree
> > on that then it would be good to prohibit setting this GUC on standby
> > or at least it should be a no-op even if this GUC should be set on
> > physical standby.
> 
> I'd prefer to completely prohibit it on standby (to make it very clear it's not
> working at all) as long as one can enable it without downtime once the standby
> is promoted (which is the case currently).

I think we could not check if we are in a standby server in the GUC check_hook,
because the XLogCtl(which is checked in RecoveryInProgress) may have not been
initialized yet. Besides, other GUCs like synchronous_standby_names also don't
work on standby but it will be no-op. So I feel we can also ignore
standby_slot_names on standby. What do you think ?

Here is the V46 patch set which changed the following things:

V46-0001:
* Address Peter[1] and Amit's[2] comments.
* Fix one CFbot failure in meson build.
* Ignore the standby_slot_names on a standby server since we don't support
  syncing slots to cascade standby.

V46-0002:
1) Fix for CFBot make warning.
2) Cascading support removal. Now we do not need to check 'sync_state != 'i''
   in the query while fetching failover slots. This check was needed on the
   cascading standby to fetch failover slots from the first standby.
3) Test correction and optimization.

0003 patch is removed since we agreed not to support syncing slots to cascading
standby.

Thanks Shveta for working on the changes in V46-0002 and thanks Ajin for
working on the test optimization.

--
TODO

There are few pending comments that mentioned in [3][4][5] which are still in
progress.

[1] https://www.postgresql.org/message-id/CAHut%2BPsf9z132WNgy0Gr10ZTnonpNjvTBj74wG8kSxXU4rOD7g%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAA4eK1%2BCXpfiTLbYRaOoUBP9Z1-xJZdX6QOp14rCdaF5E2gsgQ%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAJpy0uDaGMNpgmdxie-MgHmMhnD4ET_LDjQNEe76xJ%2BMLqRQ8Q%40mail.gmail.com
[4] https://www.postgresql.org/message-id/CAJpy0uDcOf5Hvk_CdCCAbfx9SY%2Bog%3D%3D%3DtgiuhWKzkYyqebui9g%40mail.gmail.com
[5] https://www.postgresql.org/message-id/CAJpy0uC-8mrn6jakcFjSVmbJiHZs-Okq8YKxGfrMLPD-2%3DwOqQ%40mail.gmail.com

Best Regards,
Hou zj

On Wed, Dec 13, 2023 at 11:42 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Dec 13, 2023 at 10:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Dec 11, 2023 at 5:13 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Mon, Dec 11, 2023 at 1:22 PM Drouvot, Bertrand
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > > >
> > > > > If we agree
> > > > > on that then it would be good to prohibit setting this GUC on standby
> > > > > or at least it should be a no-op even if this GUC should be set on
> > > > > physical standby.
> > > >
> > > > I'd prefer to completely prohibit it on standby (to make it very clear it's not
> > > > working at all) as long as one can enable it without downtime once the standby
> > > > is promoted (which is the case currently).
> > >
> > > And I think slot-sync worker should exit as well on cascading standby. Thoughts?
> > >
> >
> > I think one has set all the valid parameters for the slot-sync worker
> > on standby, we should not exit, rather it should be no-op which means
> > it should not try to sync slots from another standby. One scenario
> > where this may help is when users promote the standby which has
> > already synced slots from the primary. In this case, cascading standby
> > will become non-cascading and should sync slots.
> >
>
> Right, then perhaps we should increase naptime in this no-op case. It
> could be even more then current inactivity naptime which is just
> 10sec. Shall it be say 5min in this case?
>

PFA v47 attached, changes are:

patch 001:
1) Addressed comment in [1]. Thanks Hou-san for this change.

patch 002
2) Slot sync worker will be no-op if it is on cascading standby as
suggested in [2]
3) StartTransaction related optimization as suggested in [3]
4) Few other comments' improvement and code-cleanup.

TODO:
--Few pending comments as I stated in [4] (mainly header inclusion in
tablesync.c, and 'r' to 'n' conversion on promotion)
--The comments given today in [5]

[1]:
https://www.postgresql.org/message-id/CABdArM4Cow6aOLjGG9qnp6mhg%2B%2BgjK%3DHDO%3DKSU%3D6%3DyT7hLkknQ%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAA4eK1Ki1O65SyA6ijh-Mq4zpzeh644fCmkrZXMJcQXHNrAw0Q%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAA4eK1L3DiKL_Wq-VdU%2B9wmjmO5%2Bfrf%3DZHK9Lzq-7zOezPP%2BWg%40mail.gmail.com
[4]: https://www.postgresql.org/message-id/CAJpy0uDcOf5Hvk_CdCCAbfx9SY%2Bog%3D%3D%3DtgiuhWKzkYyqebui9g%40mail.gmail.com
[5]: https://www.postgresql.org/message-id/CAHut%2BPtOc7J_n24HJ6f_dFWTuD3X2ApOByQzZf6jZz%2B0wb-ebQ%40mail.gmail.com

thanks
Shveta

On Fri, Dec 15, 2023 at 11:02 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Dec 14, 2023 at 4:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Dec 14, 2023 at 7:00 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > Hi, here are a few more review comments for the patch v47-0002
> > >
> > > (plus my review comments of v45-0002 [1] are yet to be addressed)
> > >
> > > ======
> > > 1. General
> > >
> > > For consistency and readability, try to use variables of the same
> > > names whenever they have the same purpose, even when they declared are
> > > in different functions. A few like this were already mentioned in the
> > > previous review but there are more I keep noticing.
> > >
> > > For example,
> > > 'slotnameChanged' in function, VERSUS 'primary_slot_changed' in the caller.
> > >
> > >
> > > ======
> > > src/backend/replication/logical/slotsync.c
> > >
> > > 2.
> > > +/*
> > > + *
> > > + * Validates the primary server info.
> > > + *
> > > + * Using the specified primary server connection, it verifies whether
> > > the master
> > > + * is a standby itself and returns true in that case to convey the caller that
> > > + * we are on the cascading standby.
> > > + * But if master is the primary server, it goes ahead and validates
> > > + * primary_slot_name. It emits error if the physical slot in primary_slot_name
> > > + * does not exist on the primary server.
> > > + */
> > > +static bool
> > > +validate_primary_info(WalReceiverConn *wrconn)
> > >
> > > 2b.
> > > IMO it is too tricky to have a function called "validate_xxx", when
> > > actually you gave that return value some special unintuitive meaning
> > > other than just validation. IMO it is always better for the returned
> > > value to properly match the function name so the expectations are very
> > > obvious. So, In this case, I think a better function signature would
> > > be like this:
> > >
> > > SUGGESTION
> > >
> > > static void
> > > validate_primary_info(WalReceiverConn *wrconn, bool *master_is_standby)
> > >
> > > or
> > >
> > > static void
> > > validate_primary_info(WalReceiverConn *wrconn, bool *am_cascading_standby)
> > >
> >
> > The terminology master_is_standby is a bit indirect for this usage, so
> > I would prefer the second one. Shall we name this function as
> > check_primary_info()? Additionally, can we rewrite the following
> > comment: "Using the specified primary server connection, check whether
> > we are cascading standby. It also validates primary_slot_info for
> > non-cascading-standbys.".
> >
> > --
> > With Regards,
> > Amit Kapila.
>
>
> PFA v48. Changes are:
>
Sorry, I missed attaching the patch. PFA v48.

> 1) Addressed comments by Peter for v45-002 and v47-002 given in [1]
> and [2] respectively
> 2) Addressed comments by Amit for v47-002 given in [3], [4]
> 3) Addressed an old comment (#74 in [5]) of getting rid of header
> inclusion from tablesync.c when there was no code change in that file.
> Thanks Hou-san for working on this change.
>
>
> TODO:
> --Address the test comments in [1] for 050_standby_failover_slots_sync.pl
> --Review the feasibility of addressing one pending comment (comment 13
> in [5]) of 'r'->'n' conversion.
>
> [1]: https://www.postgresql.org/message-id/CAHut%2BPtOc7J_n24HJ6f_dFWTuD3X2ApOByQzZf6jZz%2B0wb-ebQ%40mail.gmail.com
> [2]: https://www.postgresql.org/message-id/CAHut%2BPsvxs-%3Dj3aCpPVs3e4w78HndCdO-F4bLPzAX70%2BdgWUuQ%40mail.gmail.com
> [3]: https://www.postgresql.org/message-id/CAA4eK1L2ts%3DgfiF4aw7-DH8HWj29s08hVRq-Ff8%3DmjfdUXx8CA%40mail.gmail.com
> [4]: https://www.postgresql.org/message-id/CAA4eK1%2Bw9yv%2B4UZXhiDHZpGDfbeRHYDBu23FwsniS8sYUZeu1w%40mail.gmail.com
> [5]:
https://www.postgresql.org/message-id/CAJpy0uDcOf5Hvk_CdCCAbfx9SY%2Bog%3D%3D%3DtgiuhWKzkYyqebui9g%40mail.gmail.com
>
> thanks
> Shveta

On Friday, December 15, 2023 1:32 PM shveta malik <shveta.malik@gmail.com> wrote:
> 
> TODO:
> --Address the test comments in [1] for 050_standby_failover_slots_sync.pl
> --Review the feasibility of addressing one pending comment (comment 13 in
> [5]) of 'r'->'n' conversion.

Here is the V49 patch set which addressed above TODO items.

The patch also includes the following changes:

V49-0001

1) added some documents to mention it's user responsibility to ensure the table
   sync is completed before subscriber to the new primary.
2) fix one CFbot failure in 050_standby_failover_slots_sync.pl.

V49-0002
1) added few comments to mention why we retain the READY state after promotion.
2) Prevent user from altering the slots that is being synced.
3) fix one CFbot failure in 050_standby_failover_slots_sync.pl.
4) Improve the 050_standby_failover_slots_sync.pl to remove some unnecessary
   operations.

V49-0003

There is one unstable test in V48-0002 which is to validate the restart_lsn of
synced slot. We test it by checking "'$primary_restart_lsn' <= restart_lsn"
which would wrongly allow the standby to go ahead of primary. And it may fail
randomly as standby may still be lagging behind primary if the slot-sync worker
has gone to longer nap (10 sec) and has not taken the slots-changes yet.  And
we cannot put sleep of 10sec here.

We may consider removing this test as it may be enough to test
that logical replication is proceeding well from the synced slots on new
primary. So, I temporarily move it into a separately patch for review.

Thanks Ajin for working on the testcases improvement.

> 
> [1]:
> https://www.postgresql.org/message-id/CAHut%2BPtOc7J_n24HJ6f_dFWTu
> D3X2ApOByQzZf6jZz%2B0wb-ebQ%40mail.gmail.com
> [5]:
> https://www.postgresql.org/message-id/CAJpy0uDcOf5Hvk_CdCCAbfx9SY%
> 2Bog%3D%3D%3DtgiuhWKzkYyqebui9g%40mail.gmail.com

Best Regards,
Hou zj

On Tue, Dec 19, 2023 at 11:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 19, 2023 at 4:51 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> >
> > ======
> > doc/src/sgml/system-views.sgml
> >
> > 3.
> > +      <para>
> > +      The hot standby can have any of these sync_state values for the slots but
> > +      on a hot standby, the slots with state 'r' and 'i' can neither be used
> > +      for logical decoding nor dropped by the user.
> > +      The sync_state has no meaning on the primary server; the primary
> > +      sync_state value is default 'n' for all slots but may (if leftover
> > +      from a promoted standby)  also be 'r'.
> > +      </para></entry>
> >
> > I still feel we are exposing too much useless information about the
> > primary server values.
> >
> > Isn't it sufficient to just say "The sync_state values have no meaning
> > on a primary server.", and not bother to mention what those
> > meaningless values might be -- e.g. if they are meaningless then who
> > cares what they are or how they got there?
> >
>
> I feel it would be good to mention somewhere that primary can have
> slots in 'r' state, if not here, some other place.
>
> >
> > 7.
> > +/*
> > + * Exit the slot sync worker with given exit-code.
> > + */
> > +static void
> > +slotsync_worker_exit(const char *msg, int code)
> > +{
> > + ereport(LOG, errmsg("%s", msg));
> > + proc_exit(code);
> > +}
> >
> > This could be written differently (don't pass the exit code, instead
> > pass a bool) like:
> >
> > static void
> > slotsync_worker_exit(const char *msg, bool restart_worker)
> >
> > By doing it this way, you can keep the special exit code values (0,1)
> > within this function where you can comment all about them instead of
> > having scattered comments about exit codes in the callers.
> >
> > SUGGESTION
> > ereport(LOG, errmsg("%s", msg));
> > /* <some big comment here about how the code causes the worker to
> > restart or not> */
> > proc_exit(restart_worker ? 1 : 0);
> >
>
> Hmm, I don't see the need for this function in the first place. We can
> use proc_exit in the two callers directly.
>
> --
> With Regards,
> Amit Kapila.


PFA v50 patch-set which addresses comments for  v48-0002 and v49-0002
given in [1], [2] and [3].

TODO:
--Fix CFBot failure.
--Work on correctness of test to merge patch003 to patch002

[1]: https://www.postgresql.org/message-id/CAA4eK1Ko-EBBDkea2R8V8PeveGg10PBswCF7JQdnRu%2BMJP%2BYBQ%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAHut%2BPsyZQZ1A4XcKw-D%3DvcTg16pN9Dw0PzE8W_X7Yz_bv00rQ%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAHut%2BPv86wBZiyOLHxycd8Yj9%3Dk5kzVa1x7Gbp%2B%3Dc1VGT9TG2w%40mail.gmail.com


thanks
Shveta

On Wednesday, December 20, 2023 4:03 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:

Hi,

> 
> Dear Amit, Shveta,
> 
> > > walsender.c
> > >
> > > 01. WalSndWaitForStandbyConfirmation
> > >
> > > ```
> > > +        sleeptime =
> WalSndComputeSleeptime(GetCurrentTimestamp());
> > > ```
> > >
> > > It works well, but I'm not sure whether we should use
> > WalSndComputeSleeptime()
> > > because the function won't be called by walsender.
> > >
> >
> > I don't think it is correct to use this function because it is
> > walsender specific, for example, it uses 'last_reply_timestamp' which
> > won't be even initialized in the backend environment. We need to
> > probably use a different logic for sleep here or need to use a
> > hard-coded value.
> 
> Oh, you are right. I haven't look until the func.
> 
> > I think we should change the name of functions like
> > WalSndWaitForStandbyConfirmation() as they are no longer used by
> > walsender. IIRC, earlier, we had a common logic to wait from both
> > walsender and SQL APIs which led to this naming but that is no longer
> > true with the latest patch.
> 
> How about "WaitForStandbyConfirmation", which is simpler? There are some
> functions like "WaitForParallelWorkersToFinish", "WaitForProcSignalBarrier"
> and so on.

Thanks for the comments. I think WaitForStandbyConfirmation is OK.
And I removed the WalSnd prefix for these functions and move them to
slot.c where the standby_slot_names is declared.

> 
> > > 02.WalSndWaitForStandbyConfirmation
> > >
> > > ```
> > > +        ConditionVariableTimedSleep(&WalSndCtl->wal_confirm_rcv_cv,
> > sleeptime,
> > > +
> > WAIT_EVENT_WAL_SENDER_WAIT_FOR_STANDBY_CONFIRMATION)
> > > ```
> > >
> > > Hmm, is it OK to use the same event as WalSndWaitForWal()? IIUC it
> > > should be
> > avoided.
> > >
> >
> > Agreed. So, how about using
> > WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION
> > so that we can use it both from the backend and walsender?
> 
> Seems right. Note again that a description of .txt file must be also fixed.

Changed.

> 
> Anyway, further comments on v50-0001.
> 
> ~~~~~
> protocol.sgml
> 
> 01. create_replication_slot
> 
> ```
> +       <varlistentry>
> +        <term><literal>FAILOVER { 'true' | 'false' }</literal></term>
> +        <listitem>
> +         <para>
> +          If true, the slot is enabled to be synced to the physical
> +          standbys so that logical replication can be resumed after failover.
> +         </para>
> +        </listitem>
> +       </varlistentry>
> ```
> 
> IIUC, the true/false is optional. libpqwalreceiver does not add the boolean.
> Also you can follow the notation of `TWO_PHASE`.

Changed.

> 
> 02. alter_replication_slot
> 
> ```
> +      <variablelist>
> +       <varlistentry>
> +        <term><literal>FAILOVER { 'true' | 'false' }</literal></term>
> +        <listitem>
> +         <para>
> +          If true, the slot is enabled to be synced to the physical
> +          standbys so that logical replication can be resumed after failover.
> +         </para>
> +        </listitem>
> +       </varlistentry>
> +      </variablelist>
> ```
> 
> Apart from above, this boolean is mandatory, right?
> But you can follow other notation.
> 

Right, changed it to optional to be consistent with others.

> 
> ~~~~~~~
> slot.c
> 
> 03. validate_standby_slots
> 
> ```
> +    /* Need a modifiable copy of string. */
> ...
> +    /* Verify syntax and parse string into a list of identifiers. */
> ```
> 
> Unnecessary comma?

You mean comma or period ? I think the current style is OK.

> 
> 04. validate_standby_slots
> 
> ```
> +    if (!ok || !ReplicationSlotCtl)
> +    {
> +        pfree(rawname);
> +        list_free(elemlist);
> +        return ok;
> +    }
> ```
> 
> It may be more efficient to exit earlier when ReplicationSlotCtl is NULL.

I think even if ReplicationSlotCtl is NULL, we still need to check the syntax
of the slot names.

> 
> ~~~~~~~
> walsender.c
> 
> 05. PhysicalWakeupLogicalWalSnd
> 
> ```
> +/*
> + * Wake up the logical walsender processes with failover-enabled slots
> +if the
> + * physical slot of the current walsender is specified in
> +standby_slot_names
> + * GUC.
> + */
> +void
> +PhysicalWakeupLogicalWalSnd(void)
> ```
> 
> The function can be called from backend processes, but you said "the current
> walsender"
> in the comment.

Changed the words.

> 
> 06. WalSndRereadConfigAndReInitSlotList
> 
> ```
> +    char       *pre_standby_slot_names;
> +
> +    ProcessConfigFile(PGC_SIGHUP);
> +
> +    /*
> +     * If we are running on a standby, there is no need to reload
> +     * standby_slot_names since we do not support syncing slots to
> cascading
> +     * standbys.
> +     */
> +    if (RecoveryInProgress())
> +        return;
> +
> +    pre_standby_slot_names = pstrdup(standby_slot_names);
> ```
> 
> I felt that we must preserve pre_standby_slot_names before calling
> ProcessConfigFile().
> 

Good catch. Fixed.

> 
> 07. WalSndFilterStandbySlots
> 
> I felt the prefix "WalSnd" may not be needed because both backend processes
> and walsender will call the function.

Right, renamed.

Attach the V51 patch set which addressed Kuroda-san's comments.
I also tried to improve the test in 0003 to make it stable.

Best Regards,
Hou zj

Attachment

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

20 December 2023, 12:43:55

On Tuesday, December 19, 2023 9:05 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
> 
> Dear Shveta,
> 
> I resumed to review the patch. I will play more about it, but I can post some
> cosmetic comments.

Thanks for the comments.

> 
> ====
> walsender.c
> 
> 01. WalSndWaitForStandbyConfirmation
> 
> ```
> +        sleeptime = WalSndComputeSleeptime(GetCurrentTimestamp());
> ```
> 
> It works well, but I'm not sure whether we should use
> WalSndComputeSleeptime()
> because the function won't be called by walsender.

Changed to a hard-coded value.

> 
> 02.WalSndWaitForStandbyConfirmation
> 
> ```
> +        ConditionVariableTimedSleep(&WalSndCtl->wal_confirm_rcv_cv,
> sleeptime,
> +
> WAIT_EVENT_WAL_SENDER_WAIT_FOR_STANDBY_CONFIRMATION)
> ```
> 
> Hmm, is it OK to use the same event as WalSndWaitForWal()? IIUC it should be
> avoided.

As discussed, I change the event name to a more common one,
so that it makes sense to use it in both places.

> 
> 03. WalSndShmemInit()
> 
> ```
> +
> +        ConditionVariableInit(&WalSndCtl->wal_confirm_rcv_cv);
> ```
> 
> Unnecessary blank?

Removed.

> 
> ~~~~~
> 050_standby_failover_slots_sync.pl
> 
> 04. General
> 
> My pgperltidy modified your test. Please check.

Will run this in next version.

> 
> 05.
> 
> ```
> # Create publication on the primary
> ```
> 
> Missing "a" before publication?

Changed.

> 
> 06.
> 
> ```
> $subscriber1->init(allows_streaming => 'logical');
> ...
> $subscriber2->init(allows_streaming => 'logical');
> ```
> 
> IIUC, these settings are not needed.

Yeah, removed.

> 
> 07.
> 
> ```
> my $primary_insert_time = time();
> ```
> 
> The variable is not used.

Removed.

> 
> 08.
> 
> ```
> # Stop the standby associated with the specified physical replication slot so
> # that the logical replication slot won't receive changes until the standby
> # slot's restart_lsn is advanced or the slot is removed from the
> # standby_slot_names list
> ```
> 
> Missing comma?

Added.

> 
> 09.
> 
> ```
> $back_q->query_until(qr//,
>     "SELECT pg_logical_slot_get_changes('test_slot', NULL, NULL);\n");
> ```
> 
> Not sure, should we have to close the back_q connection?

Added the quit.

> 
> 10.
> 
> ```
> # Remove the standby from the standby_slot_names list and reload the
> # configuration
> $primary->adjust_conf('postgresql.conf', 'standby_slot_names', "''");
> $primary->psql('postgres', "SELECT pg_reload_conf()");
> ```
> a.
> Missing comma?
> 
> b.
> I counted and reload function in perl (e.g., `$primary->reload;`) is more often
> to
> be used. Do you have a reason to use pg_reload_conf()?

I think it was copied from other places, changed to ->reload.

> 
> 11.
> 
> ```
> # Now that the standby lsn has advanced, the primary must send the decoded
> # changes to the subscription.
> $publisher->wait_for_catchup('regress_mysub1');
> ```
> 
> Is the comment correct? I think primary sends data because the GUC is
> modified.

Fixed.

> 
> 12.
> 
> ```
> # Put the standby back on the primary_slot_name for the rest of the tests
> $primary->adjust_conf('postgresql.conf', 'standby_slot_names', 'sb1_slot');
> $primary->restart();
> ```
> 
> Just to confirm - you used restart() here because we must ensure the GUC
> change is
> propagated to all backends, right?

Yes, but I think restart is not necessary, so I changed it to reload.

> 
> ~~~~~
> wait_event_names.txt
> 
> 13.
> 
> ```
> +WAL_SENDER_WAIT_FOR_STANDBY_CONFIRMATION    "Waiting for the
> WAL to be received by physical standby in WAL sender process."
> ```
> 
> But there is a possibility that backend processes may wait with the event, right?

Adjusted.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

21 December 2023, 02:23:12

On Wednesday, December 20, 2023 8:42 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> Attach the V51 patch set which addressed Kuroda-san's comments.
> I also tried to improve the test in 0003 to make it stable.

The patches conflict with a recent commit dc21234.
Here is the rebased V51_2 version, there is no code changes in this version.

Best Regards,
Hou zj

On Thu, Dec 21, 2023 at 11:30 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Thursday, December 21, 2023 12:25 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Here is a minor comment for v51-0001
> >
> > ======
> > src/backend/replication/slot.c
> >
> > 1.
> > +void
> > +RereadConfigAndReInitSlotList(List **standby_slots) {
> > + char    *pre_standby_slot_names;
> > +
> > + /*
> > + * If we are running on a standby, there is no need to reload
> > + * standby_slot_names since we do not support syncing slots to
> > + cascading
> > + * standbys.
> > + */
> > + if (RecoveryInProgress())
> > + {
> > + ProcessConfigFile(PGC_SIGHUP);
> > + return;
> > + }
> > +
> > + pre_standby_slot_names = pstrdup(standby_slot_names);
> > +
> > + ProcessConfigFile(PGC_SIGHUP);
> > +
> > + if (strcmp(pre_standby_slot_names, standby_slot_names) != 0) {
> > + list_free(*standby_slots); *standby_slots = GetStandbySlotList(true);
> > + }
> > +
> > + pfree(pre_standby_slot_names);
> > +}
> >
> > Consider below, which seems a simpler way to do that but with just one return
> > point and without duplicating the ProcessConfigFile calls:
> >
> > SUGGESTION
> > {
> > char *pre_standby_slot_names = pstrdup(standby_slot_names);
> >
> > ProcessConfigFile(PGC_SIGHUP);
> >
> > if (!RecoveryInProgress())
> > {
> >   if (strcmp(pre_standby_slot_names, standby_slot_names) != 0)
> >   {
> >     list_free(*standby_slots);
> >     *standby_slots = GetStandbySlotList(true);
> >   }
> > }
> >
> > pfree(pre_standby_slot_names);
> > }
>
> Thanks for the suggestion. I also thought about this, but I'd like to avoid
> allocating/freeing memory for the pre_standby_slot_names if not needed.
>
> Best Regards,
> Hou zj
>
>

PFA v52. Changes are:

1) Addressed comments given for v48-002 in [1]  and v50-002 in [2]

2) Merged patch003 (test improvement) to patch002 itself.

3) Restructured code around ReplicationSlotDrop to remove extra arg 'user_cmd'

4) Fixed a bug wherein promotion flow was breaking. The pid of
slot-sync worker was nullified in slotsync_worker_onexit() before the
worker can release the acquired slot in ReplicationSlotShmemExit().
Due to this, the startup process which relies on worker's pid tried to
drop the 'i' state slots assuming the slot sync worker has stopped
whereas the slot sync worker was trying to modify the slot
concurrently, resulting into the problem. This was due to the fact
that slotsync_worker_onexit() was registered with before_shmem_exit().
It should instead be registered using on_shmem_exit(). Corrected it
now.  Thanks Hou-San for working on this.

[1]: https://www.postgresql.org/message-id/CAA4eK1J5zTmm4NE4os59WgU4AZPNb74X-n67pY8SkoDfzsN_jA%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAHut%2BPvocO_bwwz7kD-4mLnFRCLOK3i0ocLyGDvLQKzkhzEjTg%40mail.gmail.com

On Fri, Dec 22, 2023 at 3:11 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Thursday, December 21, 2023 5:39 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Thu, Dec 21, 2023 at 02:23:12AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > On Wednesday, December 20, 2023 8:42 PM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > Attach the V51 patch set which addressed Kuroda-san's comments.
> > > > I also tried to improve the test in 0003 to make it stable.
> > >
> > > The patches conflict with a recent commit dc21234.
> > > Here is the rebased V51_2 version, there is no code changes in this version.
> > >
> >
> > Thanks!
> >
> > I've a few remarks regarding 0001:
>
> Thanks for the comments!
>
> >
> > 1 ===
> >
> > In the commit message what about replacing "Allow logical walsenders to wait
> > for the physical standbys" with "Force some logical walsenders to wait for the
> > physical standbys"?
>
> I feel 'Allow' is OK, as the GUC standby_slot_names is optional for user. ISTM, 'force'
> means we always wait for physical standbys regardless of the GUC.
>
> >
> > Also I think it would be better to first explain what we are trying to achieve and
> > after explain how we do it (adding a new flag in CREATE SUBSCRIPTION and so
> > on).
>
> Noted. We are about to split the patches, so will improve each commit message after that.
>
> >
> > 4 ===
> >
> > @@ -248,10 +262,13 @@ ReplicationSlotValidateName(const char *name, int
> > elevel)
> >   *     during getting changes, if the two_phase option is enabled it can skip
> >   *     prepare because by that time start decoding point has been moved. So
> > the
> >   *     user will only get commit prepared.
> > + * failover: If enabled, allows the slot to be synced to physical standbys so
> > + *     that logical replication can be resumed after failover.
> >
> > s/allows/forces ?
>
> I think whether the slot is synced also depends on the
> GUC setting on standby, so I feel 'allow' is fine here.
>
> >
> > 5 ===
> >
> > +       bool            ok;
> >
> > parse_ok maybe?
>
> The flag is also used to store the slot type check result, so I feel 'ok' is
> better here.
>
> >
> > 6 ===
> >
> > +       /* Need a modifiable copy of string. */
> > +       rawname = pstrdup(*newval);
> >
> > It seems to me that the single line comments in the neighborhood functions
> > (see
> > RestoreSlotFromDisk() for example) don't finish with ".". Worth to follow the
> > same format for all what we add in slot.c?
>
> I felt we have both styles in slot.c, but it seems Kuroda-san also
> prefer removing the ".", so will address.
>
> >
> > 7 ===
> >
> > +static void
> > +parseAlterReplSlotOptions(AlterReplicationSlotCmd *cmd, bool *failover)
> >
> > ParseAlterReplSlotOptions instead?
>
> I think it followed parseCreateReplSlotOptions, but I agree that it looks
> inconsistent with other names. Will address.
>
> > 11 ===
> >
> > +    * When the wait event is WAIT_FOR_STANDBY_CONFIRMATION, wait on
> > another
> > +    * CV that is woken up by physical walsenders when the walreceiver has
> > +    * confirmed the receipt of LSN.
> >
> > s/that is woken up by/that is broadcasted by/ ?
>
> Will reword the comment here.
>
> >
> > 12 ===
> >
> > We are mentioning in several places that the replication can be resumed after a
> > failover. Should we add a few words about possible lag? (see [1])
> >
> > [1]:
> > https://www.postgresql.org/message-id/CAA4eK1KihniOK21mEVYtSOHRQiG
> > NyToUmENWp7hPbH_PMsqzkA%40mail.gmail.com
>
> It feels like the implementation detail to me, but noted. We will think more
> about the document.
>
>
> The comments not mentioned above look good to me.
>
> Best Regards,
> Hou zj


PFA v53. Changes are:

patch001:
1) Addressed comments in [1] for v51-001. Thanks Hou-san for working on this.

patch002:
2) Addressed comments in [2] for v52-002.
3) Fixed CFBot failure. The failure was caused by an assert in
wait_for_primary_slot_catchup() for null confirmed_lsn received. In
wait_for_primary_slot_catchup(), we had an assumption that if
restart_lsn is valid and 'conflicting' is also false, then we must
have non-null confirmed_lsn. But this is not true. It is possible to
get null values for confirmed_lsn and catalog_xmin if on the primary
server the slot is just created with a valid restart_lsn and slot-sync
worker has fetched the slot before the primary server could set valid
confirmed_lsn and catalog_xmin. In
pg_create_logical_replication_slot(), there is a small window between
CreateInitDecodingContext-->ReplicationSlotReserveWal() which sets
restart_lsn and DecodingContextFindStartpoint() which sets
confirmed_lsn. If the slot-sync worker fetches the slot in this
window, confirmed_lsn received will be NULL. Corrected the code to
remove assert and added one additional condition that confirmed_lsn
should be valid before moving the slot to 'r'.

[1]: https://www.postgresql.org/message-id/ZYQHvgBpH0GgQaJK%40ip-10-97-1-34.eu-west-3.compute.internal
[2]:
https://www.postgresql.org/message-id/TY3PR01MB98893274D5A4FD4F86CC04A0F595A%40TY3PR01MB9889.jpnprd01.prod.outlook.com

thanks
Shveta

On Wednesday, December 20, 2023 7:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Wed, Dec 20, 2023 at 3:29 PM shveta malik <shveta.malik@gmail.com>
> wrote:
> >
> > On Wed, Dec 20, 2023 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> > > On Tue, Dec 19, 2023 at 5:30 PM shveta malik <shveta.malik@gmail.com>
> wrote:
> > > >
> > > > Thanks for reviewing. I have addressed these in v50.
> > > >
> > >
> > > I was looking at this patch to see if something smaller could be
> > > independently committable. I think we can extract
> > > pg_get_slot_invalidation_cause() and commit it as that function
> > > could be independently useful as well. What do you think?
> > >
> >
> > Sure, forked another thread [1]
> > [1]:
> >
> https://www.postgresql.org/message-id/CAJpy0uBpr0ym12%2B0mXpjcRFA6
> N%3D
> > anX%2BYk9aGU4EJhHNu%3DfWykQ%40mail.gmail.com
> >
> 
> Thanks, thinking more, we can split the patch into the following three patches
> which can be committed separately (a) Allowing the failover property to be set
> for a slot via SQL API and subscription commands
> (b) sync slot worker infrastructure (c) GUC standby_slot_names and the the
> corresponding wait logic in server-side.
> 
> Thoughts?

I agree. Here is the V54 patch set which was split based on the suggestion.
The commit message in each patch is also improved.

Best Regards,
Hou zj

On Wed, Dec 27, 2023 at 4:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 27, 2023 at 11:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Dec 26, 2023 at 9:27 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > I would like to revisit the current dependency of slotsync worker on
> > > dbname used in 002 patch. Currently we accept dbname in
> > > primary_conninfo and thus the user has to make sure to provide one (by
> > > manually altering it) even in case of a conf file auto-generated by
> > > "pg_basebackup -R".
> > > Thus I would like to discuss if there are better ways to do it.
> > > Complete background is as follow:
> > >
> > > We need dbname for 2 purposes:
> > >
> > > 1) to connect to remote db in order to run SELECT queries to fetch the
> > > info needed by slotsync worker.
> > > 2) to make connection in slot-sync worker itself in order to be able
> > > to use libpq APIs for 1)
> > >
> > > We run 3 kind of select queries in slot-sync worker currently:
> > >
> > > a) To fetch all failover slots (logical slots) info at once in
> > > synchronize_slots().
> > > b) To fetch a particular slot info during
> > > wait_for_primary_slot_catchup() logic (logical slot).
> > > c) To validate primary slot (physical one) and also to distinguish
> > > between standby and cascading standby by running pg_is_in_recovery().
> > >
> > >  1) One approach to avoid dependency on dbname is using commands
> > > instead of SELECT.  This will need implementing LIST_SLOTS command for
> > > a), and for b) we can use LIST_SLOTS and fetch everything (even though
> > > it is not needed) or have LIST_SLOTS with a filter on slot-name or
> > > extend READ_REPLICATION_SLOT,  and for c) we can have some other
> > > command to get pg_is_in_recovery() info. But, I feel by relying on
> > > commands we will be making the extension of the slot-sync feature
> > > difficult. In future, if there is some more requirement to fetch any
> > > other info,
> > > then there too we have to implement a command. I am not sure if it is
> > > good and extensible approach.
> > >
> > > 2) Another way to avoid asking for a dbname in primary_conninfo is to
> > > use the default dbname internally. This brings us to two questions:
> > > 'How' and 'Which default db'?
> > >
> > > 2.1) To answer 'How':
> > > Using default dbname is simpler for the purpose of slot-sync worker
> > > having its own db-connection, but is a little tricky for the purpose
> > > of connection to remote_db. This is because we have to inject this
> > > dbname internally in our connection-info.
> > >
> > > 2.1.1) Say we use primary_conninfo (i.e. original one w/o dbname),
> > > then currently it could have 2 formats:
> > >
> > > a) The simple "=" format for key-value pairs, example:
> > > 'user=replication host=127.0.0.1 port=5433 dbname=postgres'.
> > > b) URI format, example:
> > > postgresql://other@localhost/otherdb?connect_timeout=10&application_name=myapp
> > >
> > > We can distinguish between the 2 formats using 'uri_prefix_length' but
> > > injecting the dbname part will be messy specially for URI format.  If
> > > we want to do it w/o injecting and only by changing libpq interfaces
> > > to accept dbname separately apart from conninfo, then there is no
> > > current simpler way available. It will need a good amount of changes
> > > in libpq.
> > >
> > > 2.1.2) Another way is to not rely on primary_conninfo directly but
> > > rely on 'WalRcv->conninfo' in order to connect to remote_db. This is
> > > because the latter is never URI format, it is some parsed format and
> > > appending may work. As an example, primary_conninfo =
> > > 'postgresql://replication@localhost:5433', WalRcv->conninfo loaded
> > > internally is:
> > > "user=replication passfile=/home/shveta/.pgpass channel_binding=prefer
> > > dbname=replication host=localhost port=5433
> > > fallback_application_name=walreceiver sslmode=prefer sslcompression=0
> > > sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2
> > > gssencmode=disable krbsrvname=postgres gssdelegation=0
> > > target_session_attrs=any load_balance_hosts=disable", '\000'
> > >
> > > So we can try appending our default dbname to this. But all the
> > > defaults loaded in WalRcv->conninfo need some careful analysis to
> > > figure out if they work for slot-sync worker case.
> > >
> > > 2.2) Now coming to 'Which default db':
> > >
> > > 2.2.1) If we use 'template1' as default db, it may block 'create db'
> > > operations on primary for the time when the slot-sync worker is
> > > connected to remote using this dbname. Example:
> > >
> > > postgres=# create database newdb1;
> > > ERROR:  source database "template1" is being accessed by other users
> > > DETAIL:  There is 1 other session using the database.
> > >
> > > 2.2.2) If we use 'postgres' as default db, there are chances that it
> > > can be dropped as unlike 'template1', it is allowed to be dropped by
> > > user, and if slotsync worker is connected to it, user may see:
> > > newdb1=# drop database postgres;
> > > ERROR:  database "postgres" is being accessed by other users
> > > DETAIL:  There is 1 other session using the database.
> > >
> > > But once the slot-sync worker or standby goes down, user can always
> > > drop this and next time slot-sync worker may not be able to come up.
> > >
> >
> > Other random ideas for discussion are:
> >
> > 3) The slotsync worker uses primary_conninfo but also uses a new GUC
> > parameter, say slot_sync_dbname, to specify the database to connect.
> > The slot_sync_dbname overwrites the dbname if primary_conninfo also
> > specifies it. If both don't have a dbname, raise an error.
> >
>
> Would the users prefer to provide a value for a separate GUC instead
> of changing primary_conninfo? It is possible that we can have some
> users prefer to use one GUC and others prefer a separate GUC but we
> should add a new GUC if we are sure that is what users would prefer.
> Also, even if have to consider this option, I think we can easily
> later add a new GUC to provide a dbname in addition to having the
> provision of giving it in primary_conninfo.
>
> Also, I think having a separate GUC for dbanme has some complexity in
> terms of appending the dbname to primary_conninfo as pointed out by
> Shveta.
>
> > 4) The slotsync worker uses a new GUC parameter, say
> > slot_sync_conninfo, to specify the connection string to the primary
> > aside from primary_conninfo. And pg_basebackup -R generates
> > slot_sync_conninfo as well if required (new option required).
> >
>
> Yeah, this is worth considering but won't slot_sync_conninfo be mostly
> a duplicate of primary_conninfo apart from dbname? I am not sure if
> the benefit outweighs the disadvantage of having mostly similar
> information in two GUCs.
>
> --
> With Regards,
> Amit Kapila.


PFA v55. It has fixes for 2 CFBot failures seen on v53 and 1 CFBot
failure seen on v54.

patch002:
1) In 32-bit env, a Datum for int64 is treated  as a pointer, and thus
below leads to NULL pointer access if the concerned attribute is NULL.
Corrected it now.
DatumGetLSN(slot_getattr(tupslot, 3, &isnull));

2)During slot-creation on standby it is possible to get NULL
confirmed_lsn from primary even for a valid slot with valid
restart_lsn. This may happen when a slot is just created on primary
with valid restart_lsn and slot-sync worker has fetched it before
primary could set valid confirmed_lsn. And thus along with
remote_slot's restart_lsn to catch up, we also need to check for
non-null confirmed_lsn of remote_slot.

patch003:
3) Another intermittent failure was due to an unstable test added in
050_standby_failover_slots_sync.pl. It has now been removed.  The
other tests already have the coverage which the problematic test was
trying to achieve. Thank You Hou-san for working on this.

thanks
Shveta

On Fri, Dec 29, 2023 at 12:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 29, 2023 at 6:59 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Dec 27, 2023 at 7:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > >
> > > > 3) The slotsync worker uses primary_conninfo but also uses a new GUC
> > > > parameter, say slot_sync_dbname, to specify the database to connect.
> > > > The slot_sync_dbname overwrites the dbname if primary_conninfo also
> > > > specifies it. If both don't have a dbname, raise an error.
> > > >
> > >
> > > Would the users prefer to provide a value for a separate GUC instead
> > > of changing primary_conninfo? It is possible that we can have some
> > > users prefer to use one GUC and others prefer a separate GUC but we
> > > should add a new GUC if we are sure that is what users would prefer.
> > > Also, even if have to consider this option, I think we can easily
> > > later add a new GUC to provide a dbname in addition to having the
> > > provision of giving it in primary_conninfo.
> >
> > I think having two separate GUCs is more flexible for example when
> > users want to change the dbname to connect. It makes sense that the
> > slotsync worker wants to use the same connection string as the
> > walreceiver uses. But I guess today most primary_conninfo settings
> > that are set manually or are generated by tools such as pg_basebackup
> > don't have dbname. If we require a dbname in primary_conninfo, many
> > tools will need to be changed. Once the connection string is
> > generated, it would be tricky to change the dbname in it, as Shveta
> > mentioned. The users will have to carefully select the database to
> > connect when taking a base backup.
> >
>
> I see your point and agree that users need to be careful. I was trying
> to compare it with other places like the conninfo used with a
> subscription where no separate dbname needs to be provided. Now, here
> the situation is not the same because the same conninfo is used for
> different purposes (walreceiver doesn't require dbname (dbname is
> ignored even if present) whereas slotsyncworker requires dbname). I
> was just trying to see if we can avoid having a new GUC for this
> purpose. Does anyone else have an opinion on this matter?
>
> --
> With Regards,
> Amit Kapila.

Attaching the rebased patches. A recent commit 9a17be1e2 has resulted
in conflicts in pg_dump changes.

thanks
Shveta

Attachment

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

02 January 2024, 10:31:48

On Fri, Dec 29, 2023 at 10:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Dec 29, 2023 at 7:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Dec 27, 2023 at 7:13 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Wed, Dec 27, 2023 at 11:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > >  I was not aware if there is any way to connect if we
> > > want to run SQL queries. I initially tried using 'PQconnectdbParams'
> > > but couldn't make it work. Perhaps it is to be used only by front-end
> > > and extensions as the header files indicate as well:
> > >  * libpq-fe.h :      This file contains definitions for structures and
> > >  externs for functions used by frontend postgres applications.
> > >  * libpq-be-fe-helpers.h:   Helper functions for using libpq in
> > > extensions . Code built directly into the backend is not allowed to
> > > link to libpq directly.
> >
> > Oh I didn't know that. Thank you for pointing it out.
> >
> > But I'm still concerned it could confuse users that
> > pg_stat_replication keeps showing one entry that remains as "startup"
> > state.

Okay. I understand your concern. I have attached PoC patch
(v55_02-0004) which attempts to implement non-replication connection
in slotsync worker. By doing so, pg_stat_replication will not show its
entry while pg_stat_activity will still show it with 'state' as either
"active" or "idle". Currently, since we are not using any of the
replication cmds, thus non-replication connection suits well. But in
future, if there is a requirement to execute existing (or new) cmd in
slotsync worker, then that can not be done simply in non-replication
connection; it will need some changes in non-replication  or will need
the replication connection itself.

>> It has the same application_name as the walreceiver uses. For
> > example, when users want to check the particular replication
> > connection, it's common to filter the entries by the application name.
> > But it will end up having duplicate entries having different states.
> >
>
> Valid point. The main reason for using cluster_name is that if
> multiple standby's connect to the same primary, all will have the same
> application_name as 'slotsyncworker'. The other alternative could be
> to use {cluster_name}_slotsyncworker, which will probably address your
> concern and we can have to provision to differentiate among
> slotsyncworkers from different standby's.

The topup patch has also changed app_name to
{cluster_name}_slotsyncworker so that we do not confuse between
walreceiver and slotsyncworker entry.

Please note that there is no change in rest of the patches, changes
are in additional 0004 patch alone.

> --
> With Regards,
> Amit Kapila.

On Tuesday, January 2, 2024 6:32 PM shveta malik <shveta.malik@gmail.com> wrote:
> On Fri, Dec 29, 2023 at 10:25 AM Amit Kapila <amit.kapila16@gmail.com>
> 
> The topup patch has also changed app_name to
> {cluster_name}_slotsyncworker so that we do not confuse between walreceiver
> and slotsyncworker entry.
> 
> Please note that there is no change in rest of the patches, changes are in
> additional 0004 patch alone.

Attach the V56 patch set which supports ALTER SUBSCRIPTION SET (failover).
This is useful when user want to refresh the publication tables, they can now alter the
failover option to false and then execute the refresh command.

Best Regards,
Hou zj

Attachment

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

04 January 2024, 03:48:46

On Wed, Jan 3, 2024 at 6:33 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Tuesday, January 2, 2024 6:32 PM shveta malik <shveta.malik@gmail.com> wrote:
> > On Fri, Dec 29, 2023 at 10:25 AM Amit Kapila <amit.kapila16@gmail.com>
> >
> > The topup patch has also changed app_name to
> > {cluster_name}_slotsyncworker so that we do not confuse between walreceiver
> > and slotsyncworker entry.
> >
> > Please note that there is no change in rest of the patches, changes are in
> > additional 0004 patch alone.
>
> Attach the V56 patch set which supports ALTER SUBSCRIPTION SET (failover).
> This is useful when user want to refresh the publication tables, they can now alter the
> failover option to false and then execute the refresh command.
>
> Best Regards,
> Hou zj

The patches no longer apply to HEAD due to a recent commit 007693f. I
am working on rebasing and will post the new patches soon

thanks
Shveta

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

04 January 2024, 04:57:31

On Thu, Jan 4, 2024 at 9:18 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Jan 3, 2024 at 6:33 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Tuesday, January 2, 2024 6:32 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > On Fri, Dec 29, 2023 at 10:25 AM Amit Kapila <amit.kapila16@gmail.com>
> > >
> > > The topup patch has also changed app_name to
> > > {cluster_name}_slotsyncworker so that we do not confuse between walreceiver
> > > and slotsyncworker entry.
> > >
> > > Please note that there is no change in rest of the patches, changes are in
> > > additional 0004 patch alone.
> >
> > Attach the V56 patch set which supports ALTER SUBSCRIPTION SET (failover).
> > This is useful when user want to refresh the publication tables, they can now alter the
> > failover option to false and then execute the refresh command.
> >
> > Best Regards,
> > Hou zj
>
> The patches no longer apply to HEAD due to a recent commit 007693f. I
> am working on rebasing and will post the new patches soon
>
> thanks
> Shveta

Commit 007693f has changed 'conflicting' to 'conflict_reason', so
adjusted the code around that in the slotsync worker.

Also removed function 'pg_get_slot_invalidation_cause' as now
conflict_reason tells the same.

PFA rebased patches with above changes.

thanks
Shveta

On Monday, January 8, 2024 2:10 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> 
> On Fri, Jan 5, 2024 at 5:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jan 5, 2024 at 4:25 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Fri, Jan 5, 2024 at 8:59 AM shveta malik <shveta.malik@gmail.com>
> wrote:
> > > >
> > > I was going the the patch set again, I have a question.  The below
> > > comments say that we keep the failover option as PENDING until we
> > > have done the initial table sync which seems fine.  But what happens
> > > if we add a new table to the publication and refresh the
> > > subscription? In such a case does this go back to the PENDING state or
> something else?
> > >
> >
> > At this stage, such an operation is prohibited. Users need to disable
> > the failover option first, then perform the above operation, and after
> > that failover option can be re-enabled.
> 
> Okay, that makes sense to me.

During the off-list discussion, Sawada-san proposed one idea which can release the
restriction for table sync: instead of relying on the latest WAL position, we
can utilize the remote restart_lsn to reserve the WAL when creating a new
synced slot on the standby. This approach eliminates the need to wait for the
primary server to catch up, thus improving the speed of synced slot creation on
the standby in most scenarios.

By using this approach, the limitation that prevents users from performing
table sync during failover can be eliminated. In previous versions, this
restriction existed because table sync slots were often incompletely
synchronized to the standby(the slots on primary could not catch up the synced
slot). And with this approach, the table sync slots can be efficiently
synced to the standby in most cases.

However, there could still be rare cases that the WAL around remote restart_lsn
has been removed on standby, we will try to reserve the last remaining wal in
this case and mark the slot as temporary, these temp slots will be converted to
persistent once the remote restart_lsn catches up.

We think this idea is promising and here is the V58 patch set which tries to
address the idea, the summary of changes for each patch is as follows:

V58-0001

1) Enables failover for table sync slot.
2) Removes the restriction on table sync when failover is enabled.
3) Removes tristate handling for failover state.
4) Renames failoverstate to failover.
5) Address Peter's comments[1].

V58-0002
1) Add the document about how to resume logical replication after failover.
2) Don't sync temporary from primary server anymore.
3) Fix one spinlock miss.
4) Fix one CFbot warning.
5) Fixes a bug where last_update_time is not initialized.
6) Reserves WAL based on the remote restart_lsn.
7) Improves and adjusts the tests.
8) remove the separate function wait_for_primary_slot_catchup() and integrate
  its logic of marking the slot as ready into the main loop.
9) remove the 'i' state of sync_state. The slots that need to wait for the
  primary to catch up will be marked as TEMPORARY, and they will be converted
  to PERSISTENT once the remote restart_lsn catches up.

Thanks Shveta for working on 1) to 4).

V58-0003
Rebases the tests.

V58-0004:
Address Bertrand comments[2]. Thanks Shveta for working on this.

TODO: Add documents to guide user the way to identity if the table sync slot
and the main slot is READY that the logical replication can be resumed by
subscribing to the new primary.

[1] https://www.postgresql.org/message-id/CAHut%2BPvbbPz1%3DT4bzY0_GotUK460Eih41Twjt%3DczJ1z2J8SGEw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/ZZa4pLFCe2mAks1m%40ip-10-97-1-34.eu-west-3.compute.internal

Best Regards,
Hou zj

On Wednesday, January 10, 2024 2:26 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> 
> On Tue, Jan 9, 2024 at 5:44 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> comments on 0002

Thanks for the comments !

> 
> 1.
> +/* Worker's nap time in case of regular activity on the primary server */
> +#define WORKER_DEFAULT_NAPTIME_MS                   10L /* 10 ms */
> +
> +/* Worker's nap time in case of no-activity on the primary server */
> +#define WORKER_INACTIVITY_NAPTIME_MS                10000L /* 10 sec
> */
> 
> Instead of directly switching between 10ms to 10s shouldn't we increase the
> nap time gradually?  I mean it can go beyond 10 sec as well but instead of
> directly switching from 10ms to 10 sec we can increase it every time with some
> multiplier and keep a max limit up to which it can grow.  Although we can reset
> back to 10ms directly as soon as we observe some activity.

Agreed. I changed the strategy similar to what we do in the walsummarizer.

> 
> 2.
> SlotSyncWorkerCtxStruct add this to typedefs. list file
> 
> 3.
> +/*
> + * Update local slot metadata as per remote_slot's positions  */ static
> +void local_slot_update(RemoteSlot *remote_slot) {
> +Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
> +
> + LogicalConfirmReceivedLocation(remote_slot->confirmed_lsn);
> + LogicalIncreaseXminForSlot(remote_slot->confirmed_lsn,
> +    remote_slot->catalog_xmin);
> + LogicalIncreaseRestartDecodingForSlot(remote_slot->confirmed_lsn,
> +   remote_slot->restart_lsn);
> +}
> 
> IIUC on the standby we just want to overwrite what we get from primary no? If
> so why we are using those APIs that are meant for the actual decoding slots
> where it needs to take certain logical decisions instead of mere overwriting?

I think we don't have a strong reason to use these APIs, but it was convenient to
use these APIs as they can take care of updating the slots info and will call
functions like, ReplicationSlotsComputeRequiredXmin,
ReplicationSlotsComputeRequiredLSN internally. Or do you prefer directly overwriting
the fields and call these manually ?

> 
> 4.
> +/*
> + * Helper function for drop_obsolete_slots()
> + *
> + * Drops synced slot identified by the passed in name.
> + */
> +static void
> +drop_synced_slots_internal(const char *name, bool nowait)
> 
> Suggestion to add one line to explain no wait in the header

The 'nowait' flag is not necessary now, so removed.

> 
> 5.
> +/*
> + * Helper function to check if local_slot is present in remote_slots list.
> + *
> + * It also checks if logical slot is locally invalidated i.e.
> +invalidated on
> + * the standby but valid on the primary server. If found so, it sets
> + * locally_invalidated to true.
> + */
> 
> Instead of saying "but valid on the primary server" better to mention it in the
> remote_slots list, because here this function is just checking the remote_slots
> list regardless of whether the list came from.  Mentioning primary seems like it
> might fetch directly from the primary in this function so this is a bit confusing.

Adjusted.

> 
> 6.
> +/*
> + * Check that all necessary GUCs for slot synchronization are set
> + * appropriately. If not, raise an ERROR.
> + */
> +static void
> +validate_slotsync_parameters(char **dbname)
> 
> 
> The function name just says 'validate_slotsync_parameters' but it also gets the
> dbname so I think it better we change the name accordingly also instead of
> passing dbname as a parameter just return it directly.
> There
> is no need to pass this extra parameter and make the function return void.

Renamed.

> 
> 7.
> + tupslot = MakeSingleTupleTableSlot(res->tupledesc,
> + &TTSOpsMinimalTuple); tuple_ok =
> + tuplestore_gettupleslot(res->tuplestore, true, false, tupslot);
> + Assert(tuple_ok); /* It must return one tuple */
> 
> Comments say 'It must return one tuple' but asserting just for at least one tuple
> shouldn't we enhance assert so that it checks that we got exactly one tuple?

Changed to use tuplestore_tuple_count.

> 
> 8.
> /* No need to check further, return that we are cascading standby */
> + *am_cascading_standby = true;
> 
> we are not returning immediately we are just setting am_cascading_standby to
> true so adjust comments accordingly

Adjusted.

> 
> 9.
> + /* No need to check further, return that we are cascading standby */
> + *am_cascading_standby = true; } else {
> + /* We are a normal standby. */
> 
> Single-line comments do not follow the uniform pattern for the full stop, either
> use a full stop for all single-line comments or none, at least follow the same rule
> in a file or nearby comments.

Adjusted.

> 
> 10.
> + errmsg("exiting from slot synchronization due to bad configuration"),
> + errhint("%s must be defined.", "primary_slot_name"));
> 
> Why we are using the constant string "primary_slot_name" as a variable in this
> error formatting?

It was suggested to make it friendly to the translator, as the GUC doesn't needs to be translated
and it can avoid adding multiple similar message to be translated.

> 
> 11.
> + /*
> + * Hot_standby_feedback must be enabled to cooperate with the physical
> + * replication slot, which allows informing the primary about the xmin
> + and
> + * catalog_xmin values on the standby.
> 
> I do not like capitalizing the first letter of the 'hot_standby_feedback' which is a
> GUC parameter

Changed.

Here is the V59 patch set which addressed above comments and comments from Peter[1].

[1] https://www.postgresql.org/message-id/CAHut%2BPu34_dYj9MnV6n3cPsssEx57YaO6Pg0d9mDryQZX2Mx3g%40mail.gmail.com

Best Regards,
Hou zj

On Thu, Jan 11, 2024 at 1:19 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Wed, Jan 10, 2024 at 12:23:14PM +0000, Zhijie Hou (Fujitsu) wrote:
> > On Wednesday, January 10, 2024 2:26 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > + LogicalConfirmReceivedLocation(remote_slot->confirmed_lsn);
> > > + LogicalIncreaseXminForSlot(remote_slot->confirmed_lsn,
> > > +    remote_slot->catalog_xmin);
> > > + LogicalIncreaseRestartDecodingForSlot(remote_slot->confirmed_lsn,
> > > +   remote_slot->restart_lsn);
> > > +}
> > >
> > > IIUC on the standby we just want to overwrite what we get from primary no? If
> > > so why we are using those APIs that are meant for the actual decoding slots
> > > where it needs to take certain logical decisions instead of mere overwriting?
> >
> > I think we don't have a strong reason to use these APIs, but it was convenient to
> > use these APIs as they can take care of updating the slots info and will call
> > functions like, ReplicationSlotsComputeRequiredXmin,
> > ReplicationSlotsComputeRequiredLSN internally. Or do you prefer directly overwriting
> > the fields and call these manually ?
>
> I'd vote for using the APIs as I think it will be harder to maintain if we are
> not using them (means ensure the "direct" overwriting still makes sense over time).
+1

PFA v60 which addresses:

1) Peter's comment in [1]
2) Peter's off list suggestion to convert sleep_quanta to sleep_ms and
simplify the logic in wait_for_slot_activity()

[1]: https://www.postgresql.org/message-id/CAHut%2BPtJAAPghc4GPt0k%3DjeMz1qu4H7mnaDifOHsVsMqi-qOLA%40mail.gmail.com

thanks
Shveta

On Friday, January 12, 2024 8:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi,

> 
> On Thu, Jan 11, 2024 at 7:53 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jan 9, 2024 at 6:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > +static bool
> > > +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot
> > > +*remote_slot)
> > > {
> > > ...
> > > + /* Slot ready for sync, so sync it. */ else {
> > > + /*
> > > + * Sanity check: With hot_standby_feedback enabled and
> > > + * invalidations handled appropriately as above, this should never
> > > + * happen.
> > > + */
> > > + if (remote_slot->restart_lsn < slot->data.restart_lsn) elog(ERROR,
> > > + "cannot synchronize local slot \"%s\" LSN(%X/%X)"
> > > + " to remote slot's LSN(%X/%X) as synchronization"
> > > + " would move it backwards", remote_slot->name,
> > > + LSN_FORMAT_ARGS(slot->data.restart_lsn),
> > > + LSN_FORMAT_ARGS(remote_slot->restart_lsn));
> > > ...
> > > }
> > >
> > > I was thinking about the above code in the patch and as far as I can
> > > think this can only occur if the same name slot is re-created with
> > > prior restart_lsn after the existing slot is dropped. Normally, the
> > > newly created slot (with the same name) will have higher restart_lsn
> > > but one can mimic it by copying some older slot by using
> > > pg_copy_logical_replication_slot().
> > >
> > > I don't think as mentioned in comments even if hot_standby_feedback
> > > is temporarily set to off, the above shouldn't happen. It can only
> > > lead to invalidated slots on standby.
> > >
> > > To close the above race, I could think of the following ways:
> > > 1. Drop and re-create the slot.
> > > 2. Emit LOG/WARNING in this case and once remote_slot's LSN moves
> > > ahead of local_slot's LSN then we can update it; but as mentioned in
> > > your previous comment, we need to update all other fields as well.
> > > If we follow this then we probably need to have a check for
> > > catalog_xmin as well.
> > >
> >
> > The second point as mentioned is slightly misleading, so let me try to
> > rephrase it once again: Emit LOG/WARNING in this case and once
> > remote_slot's LSN moves ahead of local_slot's LSN then we can update
> > it; additionally, we need to update all other fields like two_phase as
> > well. If we follow this then we probably need to have a check for
> > catalog_xmin as well along remote_slot's restart_lsn.
> >
> > > Now, related to this the other case which needs some handling is
> > > what if the remote_slot's restart_lsn is greater than local_slot's
> > > restart_lsn but it is a re-created slot with the same name. In that
> > > case, I think the other properties like 'two_phase', 'plugin' could
> > > be different. So, is simply copying those sufficient or do we need
> > > to do something else as well?
> > >
> >
> > Bertrand, Dilip, Sawada-San, and others, please share your opinion on
> > this problem as I think it is important to handle this race condition.
> 
> Is there any good use case of copying a failover slot in the first place? If it's not
> a normal use case and we can probably live without it, why not always disable
> failover during the copy? FYI we always disable two_phase on copied slots. It
> seems to me that copying a failover slot could lead to problems, as long as we
> synchronize slots based on their names. IIUC without the copy, this pass should
> never happen.

Thanks for the suggestion. I also don't have a use case for this.
Attach the V61 patch set that addresses this suggestion. And here is the
summary of the changes made in each patch.

V61-0001
1. Reverts the changes in copy_replication_slot.

V61-0002

1. Adds the documents for the steps that user needs to follow to ensure the
   standby is ready for failover
2. Directly update the fields restart_lsn/confirmed_flush/catalog_xmin instead
   of using APIs like LogicalConfirmReceivedLocation
3. Updates all the fields(two_phase, failover, plugin) when syncing the slots
4. fixes CFbot failures.
5. Some code style adjustment. (pending comments in last version)
6. Remove some unnecessary Assert and variable assignment (off-list comments from Peter)

Thanks Shveta for working on 4 and 5.

V61-0003
1. Some documents update related to standby_slot_names and the steps for
   failover.

V61-0004
- No change.

Best Regards,
Hou zj

On Tue, Jan 16, 2024 at 3:10 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Jan 16, 2024 at 12:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Jan 16, 2024 at 1:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Jan 16, 2024 at 9:03 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > On Sat, Jan 13, 2024 at 12:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Fri, Jan 12, 2024 at 5:50 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > > > >
> > > > > > There are multiple approaches discussed and tried when it comes to
> > > > > > starting a slot-sync worker. I am summarizing all here:
> > > > > >
> > > > > >  1) Make slotsync worker as an Auxiliary Process (like checkpointer,
> > > > > > walwriter, walreceiver etc). The benefit this approach provides is, it
> > > > > > can control begin and stop in a more flexible way as each auxiliary
> > > > > > process could have different checks before starting and can have
> > > > > > different stop conditions. But it needs code duplication for process
> > > > > > management(start, stop, crash handling, signals etc) and currently it
> > > > > > does not support db-connection smoothly (none of the auxiliary process
> > > > > > has one so far)
> > > > > >
> > > > >
> > > > > As slotsync worker needs to perform transactions and access syscache,
> > > > > we can't make it an auxiliary process as that doesn't initialize the
> > > > > required stuff like syscache. Also, see the comment "Auxiliary
> > > > > processes don't run transactions ..." in AuxiliaryProcessMain() which
> > > > > means this is not an option.
> > > > >
> > > > > >
> > > > > > 2) Make slotsync worker as a 'special' process like AutoVacLauncher
> > > > > > which is neither an Auxiliary process nor a bgworker one. It allows
> > > > > > db-connection and also provides flexibility to have start and stop
> > > > > > conditions for a process.
> > > > > >
> > > > >
> > > > > Yeah, due to these reasons, I think this option is worth considering
> > > > > and another plus point is that this allows us to make enable_syncslot
> > > > > a PGC_SIGHUP GUC rather than a PGC_POSTMASTER.
> > > > >
> > > > > >
> > > > > > 3) Make slotysnc worker a bgworker. Here we just need to register our
> > > > > > process as a bgworker (RegisterBackgroundWorker()) by providing a
> > > > > > relevant start_time and restart_time and then the process management
> > > > > > is well taken care of. It does not need any code-duplication and
> > > > > > allows db-connection smoothly in registered process. The only thing it
> > > > > > lacks is that it does not provide flexibility of having
> > > > > > start-condition which then makes us to have 'enable_syncslot' as
> > > > > > PGC_POSTMASTER parameter rather than PGC_SIGHUP. Having said this, I
> > > > > > feel enable_syncslot is something which will not be changed frequently
> > > > > > and with the benefits provided by bgworker infra, it seems a
> > > > > > reasonably good option to choose this approach.
> > > > > >
> > > > >
> > > > > I agree but it may be better to make it a PGC_SIGHUP parameter.
> > > > >
> > > > > > 4) Another option is to have Logical Replication Launcher(or a new
> > > > > > process) to launch slot-sync worker. But going by the current design
> > > > > > where we have only 1 slotsync worker, it may be an overhead to have an
> > > > > > additional manager process maintained.
> > > > > >
> > > > >
> > > > > I don't see any good reason to have an additional launcher process here.
> > > > >
> > > > > >
> > > > > > Thus weighing pros and cons of all these options, we have currently
> > > > > > implemented the bgworker approach (approach 3).  Any feedback is
> > > > > > welcome.
> > > > > >
> > > > >
> > > > > I vote to go for (2) unless we face difficulties in doing so but (3)
> > > > > is also okay especially if others also think so.
> > > >
> > > > I am not against any of the approaches but I still feel that when we
> > > > have a standard way of doing things (bgworker) we should not keep
> > > > adding code to do things in a special way unless there is a strong
> > > > reason to do so. Now we need to decide if 'enable_syncslot' being
> > > > PGC_POSTMASTER is a strong reason to go the non-standard way?
> > > >
> > >
> > > Agreed and as said earlier I think it is better to make it a
> > > PGC_SIGHUP. Also, not sure we can say it is a non-standard way as
> > > already autovacuum launcher is handled in the same way. One more minor
> > > thing is it will save us for having a new bgworker state
> > > BgWorkerStart_ConsistentState_HotStandby as introduced by this patch.
> >
> > Why do we need to add a new BgWorkerStart_ConsistentState_HotStandby
> > for the slotsync worker? Isn't it sufficient that the slotsync worker
> > exits if not in hot standby mode?
>
> It is doable, but that will mean starting slot-sync worker even on
> primary on every server restart which does not seem like a good idea.
> We wanted to have a way where-in it does not start itself in
> non-standby mode.
>
> > Is there any technical difficulty or obstacle to make the slotsync
> > worker start using bgworker after reloading the config file?
>
> When we register slotsync worker as bgworker, we can only register the
> bgworker before initializing shared memory, we cannot register
> dynamically in the cycle of ServerLoop and thus we do not have
> flexibility of registering/deregistering the bgworker  (or controlling
> the bgworker start) based on config parameters each time they change.
> We can always start slot-sync worker and let it check if
> enable_syncslot is ON. If not, exit and retry the next time when
> postmaster will restart it after restart_time(60sec). The downside of
> this approach is, even if any user does not want slot-sync
> functionality and thus has permanently disabled 'enable_syncslot', it
> will keep on restarting and exiting there.


PFA v62. Details:

v62-001: No change.

v62-002:
1) Addressed slotsync.c related comments by Peter in [1].
2) Addressed CFBot failure where there was a crash in 32 bit env while
accessing DatumGetLSN
3) Addressed another CFBot failure where the test for
'050_standby_failover_slots_sync.pl' was hanging. Thanks Hou-San for
this fix.

v62-003:
It is a new patch which attempts to implement slot-sync worker as a
special process which is neither a bgworker nor an Auxiliary process.
Here we get the benefit of converting enable_syncslot to a PGC_SIGHUP
Guc rather than PGC_POSTMASTER. We launch the slot-sync worker only if
it is hot-standby and 'enable_syncslot' is ON.

v62-004:
Small change in document.

v62-005: No change

v62-006:
Separated the failover-ready validation steps into this separate
doc-patch (which were earlier present in v61-002 and v61-003). Also
addressed some of the doc comments by Peter in [1].
Thanks Hou-San for providing this patch.

[1]: https://www.postgresql.org/message-id/CAHut%2BPteZVNx1jQ6Hs3mEdoC%3DDNALVpJJ2mZDYim7sU-04tiaw%40mail.gmail.com

thanks
Shveta

On Wed, Jan 17, 2024 at 3:08 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Tue, Jan 16, 2024 at 05:27:05PM +0530, shveta malik wrote:
> > PFA v62. Details:
>
> Thanks!
>
> > v62-003:
> > It is a new patch which attempts to implement slot-sync worker as a
> > special process which is neither a bgworker nor an Auxiliary process.
> > Here we get the benefit of converting enable_syncslot to a PGC_SIGHUP
> > Guc rather than PGC_POSTMASTER. We launch the slot-sync worker only if
> > it is hot-standby and 'enable_syncslot' is ON.
>
> The implementation looks reasonable to me (from what I can see some parts is
> copy/paste from an already existing "special" process and some parts are
> "sync slot" specific) which makes fully sense.
>
> A few remarks:
>
> 1 ===
> +                * Was it the slot sycn worker?
>
> Typo: sycn
>
> 2 ===
> +                * ones), and no walwriter, autovac launcher or bgwriter or slot sync
>
> Instead? "* ones), and no walwriter, autovac launcher, bgwriter or slot sync"
>
> 3 ===
> + * restarting slot slyc worker. If stopSignaled is set, the worker will
>
> Typo: slyc
>
> 4 ===
> +/* Flag to tell if we are in an slot sync worker process */
>
> s/an/a/ ?
>
> 5 === (coming from v62-0002)
> +       Assert(tuplestore_tuple_count(res->tuplestore) == 1);
>
> Is it even possible for the related query to not return only one row? (I think the
> "count" ensures it).
>
> 6 ===
>         if (conninfo_changed ||
>                 primary_slotname_changed ||
> +               old_enable_syncslot != enable_syncslot ||
>                 (old_hot_standby_feedback != hot_standby_feedback))
>         {
>                 ereport(LOG,
>                                 errmsg("slot sync worker will restart because of"
>                                            " a parameter change"));
>
> I don't think "slot sync worker will restart" is true if one change enable_syncslot
> from on to off.
>
> IMHO, v62-003 is in good shape and could be merged in v62-002 (that would ease
> the review). But let's wait to see if others think differently.
>
> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com


PFA v63.

--It addresses comments by Peter given in [1], [2], comment by Nisha
given in [3], comments by Bertrand given in [4]
--It also moves race-condition fix from patch003 to patch002 as
suggested by Swada-san offlist. Race-condition is mentioned in [5]

All the changes are in patch02, patch003 and patch006.

[1]: https://www.postgresql.org/message-id/CAHut%2BPuECB8fNBfXMdTHSMKF9kL%3D0XqPw1Am4NVahfJSSHzoYg%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAHut%2BPt0uum%2B6Hg5UDofWMEJWhVEyArM1b0_B94UJmRcQmz7DA%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CABdArM73qdHyA0nteDLAQrfKNHRP%2B5Qq6p8uobg5bkE3EWiC%2Bg%40mail.gmail.com
[4]: https://www.postgresql.org/message-id/ZaegJe9JpUiQeV%2BD%40ip-10-97-1-34.eu-west-3.compute.internal
[5]: https://www.postgresql.org/message-id/CAD21AoA5izeKpp9Ei4Cd745pKX3wn-TRvhhmPFEW9UY1nx%2B_aw%40mail.gmail.com

thanks
Shveta

On Fri, Jan 19, 2024 at 1:42 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Fri, Jan 19, 2024 at 11:46:51AM +0530, shveta malik wrote:
> > Keeping 'count(*)=1' gives the benefit that it will straight
> > away give us true/false indicating if we are good or not wrt
> > primary_slot_name. I feel Assert can be removed and we can simply
> > have:
> >
> >         if (!tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
> >                 elog(ERROR, "failed to fetch primary_slot_name tuple");
> >
>
> I'd also vote for keeping it as it is and remove the Assert.

Sure, retained the query as is. Removed Assert.

PFA v64. Changes are:
1) Addressed comments by Amit in [1].
2) Addressed offlist comments given by Peter for documentation patch06.
3) Moved some docs to patch04 which were wrongly placed in patch02.
4) Addressed 1 pending comment from Bertrand (as stated above) to
remove redundant Assert from check_primary_info()

TODO:
Address comments by Peter given in [2]

[1]: https://www.postgresql.org/message-id/CAA4eK1LBnCjxBi7vPam0OfxsTEyHdvqx7goKxi1ePU45oz%3Dkhg%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAHut%2BPt5Pk_xJkb54oahR%2Bf9oawgfnmbpewvkZPgnRhoJ3gkYg%40mail.gmail.com

thanks
Shveta

On Fri, Jan 19, 2024 at 4:18 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> PFA v64.

V64 fails to apply to HEAD due to a recent commit. Rebased it. PFA
v64_2. It has no new changes.

thanks
Shveta

On Mon, Jan 22, 2024 at 3:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> minor comments on the patch:
> =======================

PFA v65 addressing the comments.

Addressed comments by Peter in [1], comments by Hou-San in [2],
comments by Amit in [3] and [4]

TODO:
Analyze the issue reported by Swada-san in [5] (pt 2)
Disallow subscription creation on standby with failover=true (as we do
not support sync on cascading standbys)

[1]: https://www.postgresql.org/message-id/CAHut%2BPt5Pk_xJkb54oahR%2Bf9oawgfnmbpewvkZPgnRhoJ3gkYg%40mail.gmail.com
[2]:
https://www.postgresql.org/message-id/OS0PR01MB57160C7184E17C6765AAE38294752%40OS0PR01MB5716.jpnprd01.prod.outlook.com
[3]: https://www.postgresql.org/message-id/CAA4eK1JPB-zpGYTbVOP5Qp26tNQPMjDuYzNZ%2Ba9RFiN5nE1tEA%40mail.gmail.com
[4]: https://www.postgresql.org/message-id/CAA4eK1Jhy1-bsu6vc0%3DNja7aw5-EK_%3D101pnnuM3ATqTA8%2B%3DSg%40mail.gmail.com
[5]: https://www.postgresql.org/message-id/CAD21AoBgzONdt3o5mzbQ4MtqAE%3DWseiXUOq0LMqne-nWGjZBsA%40mail.gmail.com

thanks
Shveta

On Mon, Jan 22, 2024 at 10:30 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Jan 22, 2024 at 3:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > minor comments on the patch:
> > =======================
>
> PFA v65 addressing the comments.
>
> Addressed comments by Peter in [1], comments by Hou-San in [2],
> comments by Amit in [3] and [4]
>
> TODO:
> Analyze the issue reported by Swada-san in [5] (pt 2)
> Disallow subscription creation on standby with failover=true (as we do
> not support sync on cascading standbys)
>
> [1]: https://www.postgresql.org/message-id/CAHut%2BPt5Pk_xJkb54oahR%2Bf9oawgfnmbpewvkZPgnRhoJ3gkYg%40mail.gmail.com
> [2]: https://www.postgresql.org/message-id/OS0PR01MB57160C7184E17C6765AAE38294752%40OS0PR01MB5716.jpnprd01.prod.outlook.com
> [3]: https://www.postgresql.org/message-id/CAA4eK1JPB-zpGYTbVOP5Qp26tNQPMjDuYzNZ%2Ba9RFiN5nE1tEA%40mail.gmail.com
> [4]: https://www.postgresql.org/message-id/CAA4eK1Jhy1-bsu6vc0%3DNja7aw5-EK_%3D101pnnuM3ATqTA8%2B%3DSg%40mail.gmail.com
> [5]: https://www.postgresql.org/message-id/CAD21AoBgzONdt3o5mzbQ4MtqAE%3DWseiXUOq0LMqne-nWGjZBsA%40mail.gmail.com
>

I was doing some testing on this. What I noticed is that creating subscriptions with failover enabled is taking a lot longer compared with a subscription with failover disabled. The setup has primary configured with standby_slot_names and that standby is enabled with enable_synclot turned on.

Publisher has one publication, no tables.
subscriber:
postgres=# \timing
Timing is on.

postgres=# CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres host=localhost port=6972' PUBLICATION pub with (failover = true);
NOTICE: created replication slot "sub" on publisher
CREATE SUBSCRIPTION
Time: 10011.829 ms (00:10.012)

== drop the sub

postgres=# CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres host=localhost port=6972' PUBLICATION pub with (failover = false);
NOTICE: created replication slot "sub" on publisher
CREATE SUBSCRIPTION
Time: 46.317 ms

With failover=true, it takes 10011 ms while failover=false takes 46 ms.

I don't see a similar delay when creating slot on the primary with pg_create_logical_replication_slot() with failover flag enabled.

Then on primary:
postgres=# SELECT 'init' FROM pg_create_logical_replication_slot('lsub2_slot', 'pgoutput', false, false, true);
?column?
----------
init
(1 row)

Time: 36.125 ms
postgres=# SELECT 'init' FROM pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false, false);
?column?
----------
init
(1 row)

Time: 53.981 ms

regards,

Ajin Cherian

Fujitsu Australia

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

23 January 2024, 11:43:12

On Tue, Jan 23, 2024 at 2:38 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> I was doing some testing on this. What I noticed is that creating subscriptions with failover enabled is taking a lot
longercompared with a subscription with failover disabled. The setup has primary configured with standby_slot_names and
thatstandby is enabled with enable_synclot turned on. 
>

Thanks Ajin for testing the patch. PFA v66 which fixes this issue.

The overall changes in this version are:

patch 001
1) Restricted enabling failover for user created slots on standby.
2) Fixed a wrong NOTICE during alter-sub which was always saying that
'changed the failover state to false' even if it was switched to true.

patch 002:
3) Addressed Peter's comment in [1]

patch 003:
4) Fixed the drop-db issue reported by Swada-San in [2]
5) Added other signal-handlers.
6) Fixed CFBot Windows compilation failure.

patch 004:
7) Fixed the issue reported by Ajin above in [3]. The performance
issue was due to the additional wait in WalSndWaitForWal() for
failover slots. Create Subscription calls
DecodingContextFindStartpoint() which then reads WALs to build the
initial snapshot which ends up calling WalSndWaitForWal() which waits
for standby confirmation for the case of failover slots. Addressed it
by skipping the wait during Create Sub as it is not needed there. We
now wait only if 'replication_active' is true.

Thanks Nisha for reporting the NOTICE issue (addressed in 2) and
working on issue #6.

Thanks Hou-San for working on #7.

[1]: https://www.postgresql.org/message-id/CAHut%2BPs6p6Km8_Hfy6X0KTuyqBKkhC84u23sQnnkhqkHuDL%2BDQ%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAD21AoBgzONdt3o5mzbQ4MtqAE%3DWseiXUOq0LMqne-nWGjZBsA%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAFPTHDbsZ%2BpxAubb9d9BwVNt5OB3_2s77bG6nHcAgUPPhEVmMQ%40mail.gmail.com

thanks
Shveta

On Wed, Jan 24, 2024 at 4:09 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> Even if others agree and we change this GUC name to
> "sync_replication_slots", I feel we should keep the values as "on" and
> "off" currently, where "on" would mean 'sync failover slots' (docs can
> state that clearly).  I do not think we should support sync of "all
> kinds of supported slots" in the first version. Maybe we can think
> about it for future versions.

PFA v67. Note that the GUC (enable_syncslot) name is unchanged. Once
we have final agreement on the name, we can make the change in the
next version.

Changes in v67 are:

1) Addressed comments by Peter given in [1].
2) Addressed comments by Swada-San given in [2].
3) Removed syncing 'failover' on standby from remote_slot. The
'failover' field will be false for synced slots. Since we do not
support sync to cascading standbys yet, thus failover=true was
misleading and unused there.

Thanks Hou-San for contributing in 2.

Changes are split across patch001,002 and 003.

TODO:
--Split patch-001 as suggested in [3].
--Change GUC name.

[1]:
https://www.postgresql.org/message-id/CAHut%2BPu_uK%3D%3DM%2BVmCMug7m7O6LAwpC05A%3DT7zP8c4G2-hS%2Bbdg%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAD21AoApGoTZu7D_7%3DbVYQqKnj%2BPZ2Rz%2Bnc8Ky1HPQMS_XL6%2BA%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAA4eK1Lxvfq9RwOEsguiMCrKPUc1He9UGz1_wi0N0cJaXFa4Eg%40mail.gmail.com

thanks
Shveta

Attachment

Re: Synchronizing slots from primary to standby

From

Peter Smith

Date:

25 January 2024, 01:37:09

Here are some review comments for the patch v67-0001.

======
1.
There are a couple of places checking for failover usage on a standby.

+ if (RecoveryInProgress() && failover)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot enable failover for a replication slot"
+    " created on the standby"));

and

+ if (RecoveryInProgress() && failover)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot enable failover for a replication slot"
+    " on the standby"));

IMO the conditions should be written the other way around (failover &&
RecoveryInProgress()) to avoid the unnecessary function calls when
'failover' flag is probably mostly default false anyway.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

25 January 2024, 02:57:30

On Wednesday, January 24, 2024 6:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Tue, Jan 23, 2024 at 5:13 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > Thanks Ajin for testing the patch. PFA v66 which fixes this issue.
> >
> 
> I think we should try to commit the patch as all of the design concerns are
> resolved now. To achieve that, can we split the failover setting patch into the
> following: (a) setting failover property via SQL commands and display it in
> pg_replication_slots (b) replication protocol command (c) failover property via
> subscription commands?
> 
> It will make each patch smaller and it would be easier to detect any problem in
> the same after commit.

Agreed. I split the original 0001 patch into 3 patches as suggested.
Here is the V68 patch set.

Best Regards,
Hou zj

On Thursday, January 25, 2024 6:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Thu, Jan 25, 2024 at 1:25 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Thu, Jan 25, 2024 at 02:57:30AM +0000, Zhijie Hou (Fujitsu) wrote:
> > >
> > > Agreed. I split the original 0001 patch into 3 patches as suggested.
> > > Here is the V68 patch set.
> 
> Thanks, I have pushed 0001.
> 
> >
> > Thanks!
> >
> > Some comments.
> >
> > Looking at 0002:
> >
> > 1 ===
> >
> > +      <para>The following options are supported:</para>
> >
> > What about "The following option is supported"? (as currently only the
> "FAILOVER"
> > is)
> >
> > 2 ===
> >
> > What about adding some TAP tests too? (I can see that
> > ALTER_REPLICATION_SLOT test is added in v68-0004 but I think having some
> in 0002 would make sense too).
> >
> 
> The subscription tests in v68-0003 will test this functionality. The one
> advantage of adding separate tests for this is that if in the future we extend this
> replication command, it could be convenient to test various options. However,
> the same could be said about existing replication commands as well. But is it
> worth having extra tests which will be anyway covered in the next commit in a
> few days?
> 
> I understand that it is a good idea and makes one comfortable to have tests for
> each separate commit but OTOH, in the longer term it will just be adding more
> test time without achieving much benefit. I think we can tell explicitly in the
> commit message of this patch that the subsequent commit will cover the tests
> for this functionality

Agreed.

> 
> One minor comment on 0002:
> +          so that logical replication can be resumed after failover.
> +         </para>
> 
> Can we move this and similar comments or doc changes to the later 0004 patch
> where we are syncing the slots?

Thanks for the comment.

Here is the V69 patch set which includes the following changes.

V69-0001, V69-0002
1) Addressed Bertrand's comments[1].

V69-0003
1) Addressed Peter's comment in [2], [3]
2) Addressed Amit's comment in [4] and above.
3) Fixed one issue that the startup process may report ERROR if it tries to drop
the same slot that the slotsync worker is acquiring. Now we take shared lock on
db in slot-sync worker before we create, update or drop any of its slots. This
is done to prevent potential conflict with ReplicationSlotsDropDBSlots() in
case that database is dropped in parallel.

V69-0004
1) Rebased and fixed one CFbot failure.

V69-0005, V69-0006, V69-0007
1) Rebased.

Thanks Shveta for rebasing and working for the changes on 0003~0007.

[1] https://www.postgresql.org/message-id/ZbIT9Kj3d8TFD8h6%40ip-10-97-1-34.eu-west-3.compute.internal
[2]: https://www.postgresql.org/message-id/CAHut%2BPt2oLfxv_%3DGN23dOOduKHBHdAkCvwSZiwSbtTJFFbQm-w%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAHut%2BPtsDYPbg7qM1nGWtJcSQBQ5JH%3DLmgyqwqBPL9k%2Bz8f5Ew%40mail.gmail.com
[4]: https://www.postgresql.org/message-id/CAA4eK1%2B4PhO-f4%2B2fForG6MOEj3jbtee_PYPtwtgww%3DonC5DSQ%40mail.gmail.com

Best Regards,
Hou zj

On Saturday, January 27, 2024 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Thu, Jan 25, 2024 at 6:42 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > Here is the V69 patch set which includes the following changes.
> >
> > V69-0001, V69-0002
> >
> 
> Few minor comments on v69-0001
> 1. In libpqrcv_create_slot(), I see we are using two types of syntaxes based on
> 'use_new_options_syntax' (aka server_version >= 15) whereas this new 'failover'
> option doesn't follow that. What is the reason of the same? I thought it is
> because older versions anyway won't support this option. However, I guess we
> should follow the syntax of the old server and let it error out.

Changed as suggested.

> BTW, did you test
> this patch with old server versions (say < 15 and >=15) by directly using
> replication commands, if so, what is the behavior of same?

Yes, I tested it. We cannot use new failover option or new
alter_replication_slot on server <17, the errors we will get are as follows:

Using failover option in create_replication_slot on server 15 ~ 16
    ERROR:  unrecognized option: failover
Using failover option in create_replication_slot on server < 15
    ERROR:  syntax error
Alter_replication_slot on server < 17
    ERROR:  syntax error at or near "ALTER_REPLICATION_SLOT"


> 
> 2.
>   }
> -
> + if (failover)
> + appendStringInfoString(&cmd, "FAILOVER, ");
> 
> Spurious line removal. Also, to follow a coding pattern similar to nearby code,
> let's have one empty line after handling of failover.

Changed.

> 
> 3.
> +/* ALTER_REPLICATION_SLOT slot */
> +alter_replication_slot:
> + K_ALTER_REPLICATION_SLOT IDENT '(' generic_option_list ')'
> 
> I think it would be better if we follow the create style by specifying syntax in
> comments as that can make the code easier to understand after future
> extensions to this command if any. See
> create_replication_slot:
> /* CREATE_REPLICATION_SLOT slot [TEMPORARY] PHYSICAL [options] */
> K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_PHYSICAL
> create_slot_options

Changed.

Attach the V70 patch set which addressed above comments and Bertrand's comments in [1]

[1] https://www.postgresql.org/message-id/ZbNt1oRZRcdIAw2c%40ip-10-97-1-34.eu-west-3.compute.internal

Best Regards,
Hou zj

On Sat, Jan 27, 2024 at 12:02 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Attach the V70 patch set which addressed above comments and Bertrand's comments in [1]
>

Since v70-0001 is pushed, rebased and attached v70_2 patches.  There
are no new changes.

thanks
Shveta

On Mon, Jan 29, 2024 at 9:35 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >
> > 2.
> >   walrcv_create_slot(LogRepWorkerWalRcvConn,
> >      slotname, false /* permanent */ , false /* two_phase */ ,
> > +    false,
> >      CRS_USE_SNAPSHOT, origin_startpos);
> >
> > I know it was previously mentioned in this thread that inline
> > parameter comments are unnecessary, but here they are already in the
> > existing code so shouldn't we do the same?
> >
>
> I think it is better to remove the even existing ones as those many
> times make code difficult to read.

I had earlier added inline comments in callers of
ReplicationSlotCreate() and walrcv_connect() for new args 'synced' and
'replication' respectively, removing those changes from pacth002 and
patch005 now.
Also improved alter-sub doc in patch001 as suggested by Peter offlist.

PFA v71 patch set with above changes.

thanks
Shveta

Attachment

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

29 January 2024, 11:30:29

On Mon, Jan 29, 2024 at 3:11 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> PFA v71 patch set with above changes.
>

Few comments on 0001
===================
1.
parse_subscription_options()
{
...
/*
* We've been explicitly asked to not connect, that requires some
* additional processing.
*/
if (!opts->connect && IsSet(supported_opts, SUBOPT_CONNECT))
{

Here, along with other options, we need an explicit check for
failover, so that if connect=false and failover=true, the statement
should give error. I was expecting the below statement to fail but it
passed with WARNING.
postgres=# create subscription sub2 connection 'dbname=postgres'
publication pub2 with(connect=false, failover=true);
WARNING:  subscription was created, but is not connected
HINT:  To initiate replication, you must manually create the
replication slot, enable the subscription, and refresh the
subscription.
CREATE SUBSCRIPTION

2.
@@ -148,6 +153,10 @@ typedef struct Subscription
  List    *publications; /* List of publication names to subscribe to */
  char    *origin; /* Only publish data originating from the
  * specified origin */
+ bool failover; /* True if the associated replication slots
+ * (i.e. the main slot and the table sync
+ * slots) in the upstream database are enabled
+ * to be synchronized to the standbys. */
 } Subscription;

Let's add this new field immediately after "bool runasowner;" as is
done for other boolean members. This will help avoid increasing the
size of the structure due to alignment when we add any new pointer
field in the future. Also, that would be consistent with what we do
for other new boolean members.

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

29 January 2024, 13:17:28

On Monday, January 29, 2024 7:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Mon, Jan 29, 2024 at 3:11 PM shveta malik <shveta.malik@gmail.com>
> wrote:
> >
> > PFA v71 patch set with above changes.
> >
> 
> Few comments on 0001

Thanks for the comments.

> ===================
> 1.
> parse_subscription_options()
> {
> ...
> /*
> * We've been explicitly asked to not connect, that requires some
> * additional processing.
> */
> if (!opts->connect && IsSet(supported_opts, SUBOPT_CONNECT)) {
> 
> Here, along with other options, we need an explicit check for failover, so that if
> connect=false and failover=true, the statement should give error. I was
> expecting the below statement to fail but it passed with WARNING.
> postgres=# create subscription sub2 connection 'dbname=postgres'
> publication pub2 with(connect=false, failover=true);
> WARNING:  subscription was created, but is not connected
> HINT:  To initiate replication, you must manually create the replication slot,
> enable the subscription, and refresh the subscription.
> CREATE SUBSCRIPTION

Added.

> 
> 2.
> @@ -148,6 +153,10 @@ typedef struct Subscription
>   List    *publications; /* List of publication names to subscribe to */
>   char    *origin; /* Only publish data originating from the
>   * specified origin */
> + bool failover; /* True if the associated replication slots
> + * (i.e. the main slot and the table sync
> + * slots) in the upstream database are enabled
> + * to be synchronized to the standbys. */
>  } Subscription;
> 
> Let's add this new field immediately after "bool runasowner;" as is done for
> other boolean members. This will help avoid increasing the size of the structure
> due to alignment when we add any new pointer field in the future. Also, that
> would be consistent with what we do for other new boolean members.

Moved this field as suggested.

Attach the V72-0001 which addressed above comments, other patches will be
rebased and posted after pushing first patch. Thanks Shveta for helping address
the comments.

Best Regards,
Hou zj

Attachment

v72-0001-Add-a-failover-option-to-subscriptions.patch

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

29 January 2024, 13:47:35

On Monday, January 29, 2024 9:17 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> On Monday, January 29, 2024 7:30 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Mon, Jan 29, 2024 at 3:11 PM shveta malik <shveta.malik@gmail.com>
> > wrote:
> > >
> > > PFA v71 patch set with above changes.
> > >
> >
> > Few comments on 0001
> 
> Thanks for the comments.
> 
> > ===================
> > 1.
> > parse_subscription_options()
> > {
> > ...
> > /*
> > * We've been explicitly asked to not connect, that requires some
> > * additional processing.
> > */
> > if (!opts->connect && IsSet(supported_opts, SUBOPT_CONNECT)) {
> >
> > Here, along with other options, we need an explicit check for
> > failover, so that if connect=false and failover=true, the statement
> > should give error. I was expecting the below statement to fail but it passed
> with WARNING.
> > postgres=# create subscription sub2 connection 'dbname=postgres'
> > publication pub2 with(connect=false, failover=true);
> > WARNING:  subscription was created, but is not connected
> > HINT:  To initiate replication, you must manually create the
> > replication slot, enable the subscription, and refresh the subscription.
> > CREATE SUBSCRIPTION
> 
> Added.
> 
> >
> > 2.
> > @@ -148,6 +153,10 @@ typedef struct Subscription
> >   List    *publications; /* List of publication names to subscribe to */
> >   char    *origin; /* Only publish data originating from the
> >   * specified origin */
> > + bool failover; /* True if the associated replication slots
> > + * (i.e. the main slot and the table sync
> > + * slots) in the upstream database are enabled
> > + * to be synchronized to the standbys. */
> >  } Subscription;
> >
> > Let's add this new field immediately after "bool runasowner;" as is
> > done for other boolean members. This will help avoid increasing the
> > size of the structure due to alignment when we add any new pointer
> > field in the future. Also, that would be consistent with what we do for other
> new boolean members.
> 
> Moved this field as suggested.
> 
> Attach the V72-0001 which addressed above comments, other patches will be
> rebased and posted after pushing first patch. Thanks Shveta for helping
> address the comments.

Apart from above comments. The new V72 patch also includes the followings changes.

1. Moved the test 'altering failover for enabled sub' to the tap-test where most
of the alter-sub behaviors are tested.

2. Rename the tap-test from 050_standby_failover_slots_sync.pl to
040_standby_failover_slots_sync.pl (the big number 050 was used to avoid
conflict with other newly committed tests). And add the test into meson.build
which was missed.

Best Regards,
Hou zj

Re: Synchronizing slots from primary to standby

From

Peter Smith

Date:

30 January 2024, 01:59:04

Here are some review comments for v72-0001

======
doc/src/sgml/ref/alter_subscription.sgml

1.
+      parameter value of the subscription. Otherwise, the slot on the
+      publisher may behave differently from what subscription's
+      <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
+      option says. The slot on the publisher could either be
+      synced to the standbys even when the subscription's
+      <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
+      option is disabled or could be disabled for sync
+      even when the subscription's
+      <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
+      option is enabled.
+     </para>

It is a bit wordy to keep saying "disabled/enabled"

BEFORE
The slot on the publisher could either be synced to the standbys even
when the subscription's failover option is disabled or could be
disabled for sync even when the subscription's failover option is
enabled.

SUGGESTION
The slot on the publisher could be synced to the standbys even when
the subscription's failover = false or may not be syncing even when
the subscription's failover = true.

======
.../t/040_standby_failover_slots_sync.pl

2.
+# Enable subscription
+$subscriber1->safe_psql('postgres',
+ "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
+
+# Disable failover for enabled subscription
+my ($result, $stdout, $stderr) = $subscriber1->psql('postgres',
+ "ALTER SUBSCRIPTION regress_mysub1 SET (failover = false)");
+ok( $stderr =~ /ERROR:  cannot set failover for enabled subscription/,
+ "altering failover is not allowed for enabled subscription");
+

Currently, those tests are under scope the big comment:

+##################################################
+# Test that changing the failover property of a subscription updates the
+# corresponding failover property of the slot.
+##################################################

But that comment is not quite relevant to these tests. So, add another
one just these:

SUGGESTION:
##################################################
# Test that cannot modify the failover option for enabled subscriptions.
##################################################

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

30 January 2024, 04:06:21

On Tue, Jan 30, 2024 at 7:29 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some review comments for v72-0001
>
> ======
> doc/src/sgml/ref/alter_subscription.sgml
>
> 1.
> +      parameter value of the subscription. Otherwise, the slot on the
> +      publisher may behave differently from what subscription's
> +      <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
> +      option says. The slot on the publisher could either be
> +      synced to the standbys even when the subscription's
> +      <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
> +      option is disabled or could be disabled for sync
> +      even when the subscription's
> +      <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
> +      option is enabled.
> +     </para>
>
> It is a bit wordy to keep saying "disabled/enabled"
>
> BEFORE
> The slot on the publisher could either be synced to the standbys even
> when the subscription's failover option is disabled or could be
> disabled for sync even when the subscription's failover option is
> enabled.
>
> SUGGESTION
> The slot on the publisher could be synced to the standbys even when
> the subscription's failover = false or may not be syncing even when
> the subscription's failover = true.
>

I think it is a matter of personal preference because I find the
existing wording in the patch easier to follow. So, I would like to
retain that as it is.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

30 January 2024, 06:01:18

On Mon, Jan 29, 2024 at 6:47 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Monday, January 29, 2024 7:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> > ===================
> > 1.
> > parse_subscription_options()
> > {
> > ...
> > /*
> > * We've been explicitly asked to not connect, that requires some
> > * additional processing.
> > */
> > if (!opts->connect && IsSet(supported_opts, SUBOPT_CONNECT)) {
> >
> > Here, along with other options, we need an explicit check for failover, so that if
> > connect=false and failover=true, the statement should give error. I was
> > expecting the below statement to fail but it passed with WARNING.
> > postgres=# create subscription sub2 connection 'dbname=postgres'
> > publication pub2 with(connect=false, failover=true);
> > WARNING:  subscription was created, but is not connected
> > HINT:  To initiate replication, you must manually create the replication slot,
> > enable the subscription, and refresh the subscription.
> > CREATE SUBSCRIPTION
>
> Added.
>

In this regard, I feel we don't need to dump/restore the 'FAILOVER'
option non-binary upgrade paths similar to the 'ENABLE' option. For
binary upgrade, if the failover option is enabled, then we can enable
it using Alter Subscription SET (failover=true). Let's add one test
corresponding to this behavior in
postgresql\src\bin\pg_upgrade\t\004_subscription.

Additionally, we need to update the pg_dump docs for the 'failover'
option. See "When dumping logical replication subscriptions, .." [1].
I think we also need to update the connect option docs in CREATE
SUBSCRIPTION [2].

[1] - https://www.postgresql.org/docs/devel/app-pgdump.html
[2] - https://www.postgresql.org/docs/devel/sql-createsubscription.html

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

30 January 2024, 10:36:10

On Tue, Jan 30, 2024 at 11:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> In this regard, I feel we don't need to dump/restore the 'FAILOVER'
> option non-binary upgrade paths similar to the 'ENABLE' option. For
> binary upgrade, if the failover option is enabled, then we can enable
> it using Alter Subscription SET (failover=true). Let's add one test
> corresponding to this behavior in
> postgresql\src\bin\pg_upgrade\t\004_subscription.

Changed pg_dump behaviour as suggested and added additional test.

> Additionally, we need to update the pg_dump docs for the 'failover'
> option. See "When dumping logical replication subscriptions, .." [1].
> I think we also need to update the connect option docs in CREATE
> SUBSCRIPTION [2].

Updated docs.

> [1] - https://www.postgresql.org/docs/devel/app-pgdump.html
> [2] - https://www.postgresql.org/docs/devel/sql-createsubscription.html

PFA v73-0001 which addresses the above comments. Other patches will be
rebased and posted after pushing this one. Thanks Hou-San for adding
pg_upgrade test for failover.

thanks
Shveta

Attachment

v73-0001-Add-a-failover-option-to-subscriptions.patch

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

30 January 2024, 12:52:54

On Tue, Jan 30, 2024 at 4:06 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> PFA v73-0001 which addresses the above comments. Other patches will be
> rebased and posted after pushing this one.

Since v73-0001 is pushed, PFA  rest of the patches. Changes are:

1) Rebased the patches.
2) Ran pg_indent on all.
3) patch001: Updated logicaldecoding.sgml for dbname requirement in
primary_conninfo for slot-synchronization.

thanks
Shveta

On Wednesday, January 31, 2024 9:57 AM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> Hi,
> 
> I saw that v73-0001 was pushed, but it included some last-minute
> changes that I did not get a chance to check yesterday.
> 
> Here are some review comments for the new parts of that patch.
> 
> ======
> doc/src/sgml/ref/create_subscription.sgml
> 
> 1.
> connect (boolean)
> 
>     Specifies whether the CREATE SUBSCRIPTION command should connect
> to the publisher at all. The default is true. Setting this to false
> will force the values of create_slot, enabled, copy_data, and failover
> to false. (You cannot combine setting connect to false with setting
> create_slot, enabled, copy_data, or failover to true.)
> 
> ~
> 
> I don't think the first part "Setting this to false will force the
> values ... failover to false." is strictly correct.
> 
> I think is correct to say all those *other* properties (create_slot,
> enabled, copy_data) are forced to false because those otherwise have
> default true values. But the 'failover' has default false, so it
> cannot get force-changed at all because you can't set connect to false
> when failover is true as the second part ("You cannot combine...")
> explains.
> 
> IMO remove 'failover' from that first sentence.
> 
> 
> 3.
>     dump can be restored without requiring network access to the remote
>     servers.  It is then up to the user to reactivate the subscriptions in a
>     suitable way.  If the involved hosts have changed, the connection
> -   information might have to be changed.  It might also be appropriate to
> +   information might have to be changed.  If the subscription needs to
> +   be enabled for
> +   <link
> linkend="sql-createsubscription-params-with-failover"><literal>failover</lit
> eral></link>,
> +   then same needs to be done by executing
> +   <link linkend="sql-altersubscription-params-set">
> +   <literal>ALTER SUBSCRIPTION ... SET(failover = true)</literal></link>
> +   after the slot has been created.  It might also be appropriate to
> 
> "then same needs to be done" (English?)
> 
> BEFORE
> If the subscription needs to be enabled for failover, then same needs
> to be done by executing ALTER SUBSCRIPTION ... SET(failover = true)
> after the slot has been created.
> 
> SUGGESTION
> If the subscription needs to be enabled for failover, execute ALTER
> SUBSCRIPTION ... SET(failover = true) after the slot has been created.
> 
> ======
> src/backend/commands/subscriptioncmds.c
> 
> 4.
>  #define SUBOPT_RUN_AS_OWNER 0x00001000
> -#define SUBOPT_LSN 0x00002000
> -#define SUBOPT_ORIGIN 0x00004000
> +#define SUBOPT_FAILOVER 0x00002000
> +#define SUBOPT_LSN 0x00004000
> +#define SUBOPT_ORIGIN 0x00008000
> +
> 
> A spurious blank line was added.
> 

Here is a small patch to address the comment 3 and 4.
The discussion for comment 1 is still going on, so we can
update the patch once it's concluded.

Best Regards,
Hou zj

Attachment

0001-clean-up-for-776621a5.patch

Re: Synchronizing slots from primary to standby

From

Masahiko Sawada

Date:

31 January 2024, 08:32:00

On Tue, Jan 30, 2024 at 9:53 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Jan 30, 2024 at 4:06 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > PFA v73-0001 which addresses the above comments. Other patches will be
> > rebased and posted after pushing this one.
>
> Since v73-0001 is pushed, PFA  rest of the patches. Changes are:
>
> 1) Rebased the patches.
> 2) Ran pg_indent on all.
> 3) patch001: Updated logicaldecoding.sgml for dbname requirement in
> primary_conninfo for slot-synchronization.
>

Thank you for updating the patches. As for the slotsync worker patch,
is there any reason why 0001, 0002, and 0004 patches are still
separated?

Beside, here are some comments on v74 0001, 0002, and 0004 patches:

---
+static char *
+wait_for_valid_params_and_get_dbname(void)
+{
+   char       *dbname;
+   int         rc;
+
+   /* Sanity check. */
+   Assert(enable_syncslot);
+
+   for (;;)
+   {
+       if (validate_parameters_and_get_dbname(&dbname))
+           break;
+       ereport(LOG, errmsg("skipping slot synchronization"));
+
+       ProcessSlotSyncInterrupts(NULL);

When reading this function, I expected that the slotsync worker would
resume working once the parameters became valid, but it was not
correct. For example, if I changed hot_standby_feedback from off to
on, the slotsync worker reads the config file, exits, and then
restarts. Given that the slotsync worker ends up exiting on parameter
changes anyway, why do we want to have it wait for parameters to
become valid? IIUC even if the slotsync worker exits when a parameter
is not valid, it restarts at some intervals.

---
+bool
+SlotSyncWorkerCanRestart(void)
+{
+#define SLOTSYNC_RESTART_INTERVAL_SEC 10
+

IIUC depending on how busy the postmaster is and the timing, the user
could wait for 1 min to re-launch the slotsync worker. But I think the
user might want to re-launch the slotsync worker more quickly for
example when the slotsync worker restarts due to parameter changes.
IIUC SloSyncWorkerCanRestart() doesn't consider the fact that the
slotsync worker previously exited with 0 or 1.

---
+       /* We are a normal standby */
+       valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull));
+       Assert(!isnull);

What do you mean by "normal standby"?

---
+   appendStringInfo(&cmd,
+                    "SELECT pg_is_in_recovery(), count(*) = 1"
+                    " FROM pg_replication_slots"
+                    " WHERE slot_type='physical' AND slot_name=%s",
+                    quote_literal_cstr(PrimarySlotName));

I think we need to make "pg_replication_slots" schema-qualified.

---
+                   errdetail("The primary server slot \"%s\" specified by"
+                             " \"%s\" is not valid.",
+                             PrimarySlotName, "primary_slot_name"));

and

+               errmsg("slot sync worker will shutdown because"
+                      " %s is disabled", "enable_syncslot"));

It's better to write it in one line for better greppability.

---
When I dropped a database on the primary that has a failover slot, I
got the following logs on the standby:

2024-01-31 17:25:21.750 JST [1103933] FATAL:  replication slot "s" is
active for PID 1103935
2024-01-31 17:25:21.750 JST [1103933] CONTEXT:  WAL redo at 0/3020D20
for Database/DROP: dir 1663/16384
2024-01-31 17:25:21.751 JST [1103930] LOG:  startup process (PID
1103933) exited with exit code 1

It seems that because the slotsync worker created the slot on the
standby, the slot's active_pid is still valid. That is why the startup
process could not drop the slot.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

31 January 2024, 10:42:25

On Wed, Jan 31, 2024 at 2:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Thank you for updating the patches. As for the slotsync worker patch,
> is there any reason why 0001, 0002, and 0004 patches are still
> separated?
>

No specific reason, it could be easier to review those parts.

>
> Beside, here are some comments on v74 0001, 0002, and 0004 patches:
>
> ---
> +static char *
> +wait_for_valid_params_and_get_dbname(void)
> +{
> +   char       *dbname;
> +   int         rc;
> +
> +   /* Sanity check. */
> +   Assert(enable_syncslot);
> +
> +   for (;;)
> +   {
> +       if (validate_parameters_and_get_dbname(&dbname))
> +           break;
> +       ereport(LOG, errmsg("skipping slot synchronization"));
> +
> +       ProcessSlotSyncInterrupts(NULL);
>
> When reading this function, I expected that the slotsync worker would
> resume working once the parameters became valid, but it was not
> correct. For example, if I changed hot_standby_feedback from off to
> on, the slotsync worker reads the config file, exits, and then
> restarts. Given that the slotsync worker ends up exiting on parameter
> changes anyway, why do we want to have it wait for parameters to
> become valid?
>

Right, the reason for waiting is to avoid repeated re-start of
slotsync worker if the required parameter is not changed. To follow
that, I think we should simply continue when the required parameter is
changed and is valid. But, I think during actual slotsync, if
connection_info is changed then there is no option but to restart.

>
> ---
> +bool
> +SlotSyncWorkerCanRestart(void)
> +{
> +#define SLOTSYNC_RESTART_INTERVAL_SEC 10
> +
>
> IIUC depending on how busy the postmaster is and the timing, the user
> could wait for 1 min to re-launch the slotsync worker. But I think the
> user might want to re-launch the slotsync worker more quickly for
> example when the slotsync worker restarts due to parameter changes.
> IIUC SloSyncWorkerCanRestart() doesn't consider the fact that the
> slotsync worker previously exited with 0 or 1.
>

Considering my previous where we don't want to restart for a required
parameter change, isn't it better to avoid repeated restart (say when
the user gave an invalid dbname)? BTW, I think this restart interval
is added based on your previous complaint [1].

>
> ---
> When I dropped a database on the primary that has a failover slot, I
> got the following logs on the standby:
>
> 2024-01-31 17:25:21.750 JST [1103933] FATAL:  replication slot "s" is
> active for PID 1103935
> 2024-01-31 17:25:21.750 JST [1103933] CONTEXT:  WAL redo at 0/3020D20
> for Database/DROP: dir 1663/16384
> 2024-01-31 17:25:21.751 JST [1103930] LOG:  startup process (PID
> 1103933) exited with exit code 1
>
> It seems that because the slotsync worker created the slot on the
> standby, the slot's active_pid is still valid.
>

But we release the slot after sync. And we do take a shared lock on
the database to make the startup process wait for slotsync. There is
one gap which is that we don't reset active_pid for temp slots in
ReplicationSlotRelease(), so for temp slots such an error can occur
but OTOH, we immediately make the slot persistent after sync. As per
my understanding, it is only possible to get this error if the initial
sync doesn't happen and the slot remains temporary. Is that your case?
How did reproduce this?

 That is why the startup
> process could not drop the slot.
>

[1] -
https://www.postgresql.org/message-id/CAD21AoApGoTZu7D_7%3DbVYQqKnj%2BPZ2Rz%2Bnc8Ky1HPQMS_XL6%2BA%40mail.gmail.com

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Masahiko Sawada

Date:

31 January 2024, 15:49:37

On Wed, Jan 31, 2024 at 7:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jan 31, 2024 at 2:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Thank you for updating the patches. As for the slotsync worker patch,
> > is there any reason why 0001, 0002, and 0004 patches are still
> > separated?
> >
>
> No specific reason, it could be easier to review those parts.

Okay, I think we can merge 0001 and 0002 at least as we don't need
bgworker codes.

>
> >
> > Beside, here are some comments on v74 0001, 0002, and 0004 patches:
> >
> > ---
> > +static char *
> > +wait_for_valid_params_and_get_dbname(void)
> > +{
> > +   char       *dbname;
> > +   int         rc;
> > +
> > +   /* Sanity check. */
> > +   Assert(enable_syncslot);
> > +
> > +   for (;;)
> > +   {
> > +       if (validate_parameters_and_get_dbname(&dbname))
> > +           break;
> > +       ereport(LOG, errmsg("skipping slot synchronization"));
> > +
> > +       ProcessSlotSyncInterrupts(NULL);
> >
> > When reading this function, I expected that the slotsync worker would
> > resume working once the parameters became valid, but it was not
> > correct. For example, if I changed hot_standby_feedback from off to
> > on, the slotsync worker reads the config file, exits, and then
> > restarts. Given that the slotsync worker ends up exiting on parameter
> > changes anyway, why do we want to have it wait for parameters to
> > become valid?
> >
>
> Right, the reason for waiting is to avoid repeated re-start of
> slotsync worker if the required parameter is not changed. To follow
> that, I think we should simply continue when the required parameter is
> changed and is valid. But, I think during actual slotsync, if
> connection_info is changed then there is no option but to restart.

Agreed.

> >
> > ---
> > +bool
> > +SlotSyncWorkerCanRestart(void)
> > +{
> > +#define SLOTSYNC_RESTART_INTERVAL_SEC 10
> > +
> >
> > IIUC depending on how busy the postmaster is and the timing, the user
> > could wait for 1 min to re-launch the slotsync worker. But I think the
> > user might want to re-launch the slotsync worker more quickly for
> > example when the slotsync worker restarts due to parameter changes.
> > IIUC SloSyncWorkerCanRestart() doesn't consider the fact that the
> > slotsync worker previously exited with 0 or 1.
> >
>
> Considering my previous where we don't want to restart for a required
> parameter change, isn't it better to avoid repeated restart (say when
> the user gave an invalid dbname)? BTW, I think this restart interval
> is added based on your previous complaint [1].

I think it's useful that the slotsync worker restarts immediately when
a required parameter is changed but waits to restart when it exits
with an error. IIUC the apply worker does so; if it restarts due to a
subscription parameter change, it resets the last-start time so that
the launcher will restart it without waiting. But if it exits with an
error, the launcher waits for wal_retrieve_retry_interval. I don't
think the slotsync worker must follow this behavior but I feel it's
useful behavior.

>
> >
> > ---
> > When I dropped a database on the primary that has a failover slot, I
> > got the following logs on the standby:
> >
> > 2024-01-31 17:25:21.750 JST [1103933] FATAL:  replication slot "s" is
> > active for PID 1103935
> > 2024-01-31 17:25:21.750 JST [1103933] CONTEXT:  WAL redo at 0/3020D20
> > for Database/DROP: dir 1663/16384
> > 2024-01-31 17:25:21.751 JST [1103930] LOG:  startup process (PID
> > 1103933) exited with exit code 1
> >
> > It seems that because the slotsync worker created the slot on the
> > standby, the slot's active_pid is still valid.
> >
>
> But we release the slot after sync. And we do take a shared lock on
> the database to make the startup process wait for slotsync. There is
> one gap which is that we don't reset active_pid for temp slots in
> ReplicationSlotRelease(), so for temp slots such an error can occur
> but OTOH, we immediately make the slot persistent after sync. As per
> my understanding, it is only possible to get this error if the initial
> sync doesn't happen and the slot remains temporary. Is that your case?
> How did reproduce this?

I created a failover slot manually on the primary and dropped the
database where the failover slot is created. So this would not happen
in normal cases.

BTW I've tested the following switch/fail-back scenario but it seems
not to work fine. Am I missing something?

Setup:
node1 is the primary, node2 is the physical standby for node1, and
node3 is the subscriber connecting to node1.

Steps:
1. [node1]: create a table and a publication for the table.
2. [node2]: set enable_syncslot = on and start (to receive WALs from node1).
3. [node3]: create a subscription with failover = true for the publication.
4. [node2]: promote to the new standby.
5. [node3]: alter subscription to connect the new primary, node2.
6. [node1]: stop, set enable_syncslot = on (and other required
parameters), then start as a new standby.

Then I got the error "exiting from slot synchronization because same
name slot "test_sub" already exists on the standby".

The logical replication slot that was created on the old primary
(node1) has been synchronized to the old standby (node2). Therefore on
node2, the slot's "synced" field is true. However, once node1 starts
as the new standby with slot synchronization, the slotsync worker
cannot synchronize the slot because the slot's "synced" field on the
primary is false.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

"Euler Taveira"

Date:

01 February 2024, 02:45:04

On Mon, Jan 29, 2024, at 10:17 AM, Zhijie Hou (Fujitsu) wrote:

Attach the V72-0001 which addressed above comments, other patches will be
rebased and posted after pushing first patch. Thanks Shveta for helping address
the comments.

While working on another patch I noticed a new NOTICE message:

NOTICE: changed the failover state of replication slot "foo" on publisher to false

I wasn't paying much attention to this thread then I start reading the 2

patches that was recently committed. The message above surprises me because

pg_createsubscriber starts to emit this message. The reason is that it doesn't

create the replication slot during the CREATE SUBSCRIPTION. Instead, it creates

the replication slot with failover = false and no such option is informed

during CREATE SUBSCRIPTION which means it uses the default value (failover =

false). I expect that I don't see any message because it is *not* changing the

behavior. I was wrong. It doesn't check the failover state on publisher, it

just executes walrcv_alter_slot() and emits a message.

IMO if we are changing an outstanding property on node A from node B, node B

already knows (or might know) about that behavior change (because it is sending

the command), however, node A doesn't (unless log_replication_commands = on --

it is not the default).

Do we really need this message as NOTICE? I would set it to DEBUG1 if it is

worth or even remove it (if we consider there are other ways to obtain the same

information).

Euler Taveira

EDB https://www.enterprisedb.com/

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

01 February 2024, 03:51:42

On Wed, Jan 31, 2024 at 9:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jan 31, 2024 at 7:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > Considering my previous where we don't want to restart for a required
> > parameter change, isn't it better to avoid repeated restart (say when
> > the user gave an invalid dbname)? BTW, I think this restart interval
> > is added based on your previous complaint [1].
>
> I think it's useful that the slotsync worker restarts immediately when
> a required parameter is changed but waits to restart when it exits
> with an error. IIUC the apply worker does so; if it restarts due to a
> subscription parameter change, it resets the last-start time so that
> the launcher will restart it without waiting.
>

Agreed, this idea sounds good to me.

> >
> > >
> > > ---
> > > When I dropped a database on the primary that has a failover slot, I
> > > got the following logs on the standby:
> > >
> > > 2024-01-31 17:25:21.750 JST [1103933] FATAL:  replication slot "s" is
> > > active for PID 1103935
> > > 2024-01-31 17:25:21.750 JST [1103933] CONTEXT:  WAL redo at 0/3020D20
> > > for Database/DROP: dir 1663/16384
> > > 2024-01-31 17:25:21.751 JST [1103930] LOG:  startup process (PID
> > > 1103933) exited with exit code 1
> > >
> > > It seems that because the slotsync worker created the slot on the
> > > standby, the slot's active_pid is still valid.
> > >
> >
> > But we release the slot after sync. And we do take a shared lock on
> > the database to make the startup process wait for slotsync. There is
> > one gap which is that we don't reset active_pid for temp slots in
> > ReplicationSlotRelease(), so for temp slots such an error can occur
> > but OTOH, we immediately make the slot persistent after sync. As per
> > my understanding, it is only possible to get this error if the initial
> > sync doesn't happen and the slot remains temporary. Is that your case?
> > How did reproduce this?
>
> I created a failover slot manually on the primary and dropped the
> database where the failover slot is created. So this would not happen
> in normal cases.
>

Right, it won't happen in normal cases (say for walsender). This can
happen in some cases even without this patch as noted in comments just
above active_pid check in ReplicationSlotsDropDBSlots(). Now, we need
to think whether we should just update the comments above active_pid
check to explain this case or try to engineer some solution for this
not-so-common case. I guess if we want a solution we need to stop
slotsync worker temporarily till the drop database WAL is applied or
something like that.

> BTW I've tested the following switch/fail-back scenario but it seems
> not to work fine. Am I missing something?
>
> Setup:
> node1 is the primary, node2 is the physical standby for node1, and
> node3 is the subscriber connecting to node1.
>
> Steps:
> 1. [node1]: create a table and a publication for the table.
> 2. [node2]: set enable_syncslot = on and start (to receive WALs from node1).
> 3. [node3]: create a subscription with failover = true for the publication.
> 4. [node2]: promote to the new standby.
> 5. [node3]: alter subscription to connect the new primary, node2.
> 6. [node1]: stop, set enable_syncslot = on (and other required
> parameters), then start as a new standby.
>
> Then I got the error "exiting from slot synchronization because same
> name slot "test_sub" already exists on the standby".
>
> The logical replication slot that was created on the old primary
> (node1) has been synchronized to the old standby (node2). Therefore on
> node2, the slot's "synced" field is true. However, once node1 starts
> as the new standby with slot synchronization, the slotsync worker
> cannot synchronize the slot because the slot's "synced" field on the
> primary is false.
>

Yeah, we avoided doing anything in this case because the user could
have manually created another slot with the same name on standby.
Unlike WAL slots can be modified on standby as we allow decoding on
standby, so we can't allow to overwrite the existing slots. We won't
be able to distinguish whether the existing slot was a slot that the
user wants to sync with primary or a slot created on standby to
perform decoding. I think in this case user first needs to drop the
slot on new standby. We probably need to document it as well unless we
decide to do something else. What do you think?

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

01 February 2024, 04:20:22

On Thu, Feb 1, 2024 at 8:15 AM Euler Taveira <euler@eulerto.com> wrote:
>
> On Mon, Jan 29, 2024, at 10:17 AM, Zhijie Hou (Fujitsu) wrote:
>
> Attach the V72-0001 which addressed above comments, other patches will be
> rebased and posted after pushing first patch. Thanks Shveta for helping address
> the comments.
>
>
> While working on another patch I noticed a new NOTICE message:
>
> NOTICE:  changed the failover state of replication slot "foo" on publisher to false
>
> I wasn't paying much attention to this thread then I start reading the 2
> patches that was recently committed. The message above surprises me because
> pg_createsubscriber starts to emit this message. The reason is that it doesn't
> create the replication slot during the CREATE SUBSCRIPTION. Instead, it creates
> the replication slot with failover = false and no such option is informed
> during CREATE SUBSCRIPTION which means it uses the default value (failover =
> false). I expect that I don't see any message because it is *not* changing the
> behavior. I was wrong. It doesn't check the failover state on publisher, it
> just executes walrcv_alter_slot() and emits a message.
>
> IMO if we are changing an outstanding property on node A from node B, node B
> already knows (or might know) about that behavior change (because it is sending
> the command), however, node A doesn't (unless log_replication_commands = on --
> it is not the default).
>
> Do we really need this message as NOTICE?
>

The reason for adding this NOTICE was to keep it similar to other
Notice messages in these commands like create/drop slot. However, here
the difference is we may not have altered the slot as the property is
already the same as we want to set on the publisher. So, I am not sure
whether we should follow the existing behavior or just get rid of it.
And then do we remove similar NOTICE in AlterSubscription() as well?
Normally, I think NOTICE intends to let users know if we did anything
with slots while executing subscription commands. Does anyone else
have an opinion on this point?

A related point, I think we can avoid setting the 'failover' property
in ReplicationSlotAlter() if it is not changed, the advantage is we
will avoid saving slots. OTOH, this won't be a frequent operation so
we can leave it as it is as well.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Ajin Cherian

Date:

01 February 2024, 05:38:13

On Tue, Jan 30, 2024 at 11:53 PM shveta malik <shveta.malik@gmail.com> wrote:

On Tue, Jan 30, 2024 at 4:06 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> PFA v73-0001 which addresses the above comments. Other patches will be
> rebased and posted after pushing this one.

Since v73-0001 is pushed, PFA rest of the patches. Changes are:

1) Rebased the patches.
2) Ran pg_indent on all.
3) patch001: Updated logicaldecoding.sgml for dbname requirement in
primary_conninfo for slot-synchronization.

thanks
Shveta

Just to test the behaviour, I modified the code to set failover flag to default to "true" while creating subscription and ran the regression tests. I only saw the expected errors.
1. Make check in postgres root folder - all failures are because of difference when listing subscription as failover flag is now enabled. The diff is attached for regress.

2. Make check in src/test/subscription - no failures All tests successful.
Files=34, Tests=457, 81 wallclock secs ( 0.14 usr 0.05 sys + 9.53 cusr 13.00 csys = 22.72 CPU)
Result: PASS

3. Make check in src/test/recovery - 3 failures Test Summary Report
-------------------
t/027_stream_regress.pl (Wstat: 256 Tests: 6 Failed: 1)
Failed test: 2
Non-zero exit status: 1
t/035_standby_logical_decoding.pl (Wstat: 7424 Tests: 8 Failed: 0)
Non-zero exit status: 29
Parse errors: No plan found in TAP output t/050_standby_failover_slots_sync.pl (Wstat: 7424 Tests: 5 Failed: 0)
Non-zero exit status: 29
Parse errors: No plan found in TAP output

3a. Analysis of t/027_stream_regress.pl - No, 027 fails with the same issue as "make check" in postgres root folder (for which I attached the diffs). 027 is about running the standard regression tests with streaming replication. Since the regression tests fail because listing subscription now has failover enabled, 027 also fails in the same way with streaming replication.

3b. Analysis of t/035_standby_logical_decoding.pl - In this test case, they attempt to create a subscription from the subscriber to the standby ##################################################
# Test that we can subscribe on the standby with the publication # created on the primary.
##################################################

Now, this fails because creating a subscription on the standby with failover enabled will result in error:
I see the following error in the log:
2024-01-28 23:51:30.425 EST [23332] tap_sub STATEMENT: CREATE_REPLICATION_SLOT "tap_sub" LOGICAL pgoutput (FAILOVER, SNAPSHOT 'nothing')
2024-01-28 23:51:30.425 EST [23332] tap_sub ERROR: cannot create replication slot with failover enabled on the standby I discussed this with Shveta and she agreed that this is the expected behaviour as we don't support failover to cascading standby yet.

3c. Analysis of t/050_standby_failover_slots_sync.pl - This is a new test case created for this patch, and it creates a subscription without failover enabled to make sure that the Subscription with failover disabled does not depend on sync on standby, but this fails because we have failover enabled by default.

In summary, I don't think these issues are actual bugs but expected behaviour change.

regards,

Ajin Cherian

Fujitsu Australia

Re: Synchronizing slots from primary to standby

From

Peter Smith

Date:

01 February 2024, 05:50:56

Here are some review comments for v740001.

======
src/sgml/logicaldecoding.sgml

1.
+   <sect2 id="logicaldecoding-replication-slots-synchronization">
+    <title>Replication Slot Synchronization</title>
+    <para>
+     A logical replication slot on the primary can be synchronized to the hot
+     standby by enabling the <literal>failover</literal> option during slot
+     creation and setting
+     <link linkend="guc-enable-syncslot"><varname>enable_syncslot</varname></link>
+     on the standby. For the synchronization
+     to work, it is mandatory to have a physical replication slot between the
+     primary and the standby, and
+     <link linkend="guc-hot-standby-feedback"><varname>hot_standby_feedback</varname></link>
+     must be enabled on the standby. It is also necessary to specify a valid
+     <literal>dbname</literal> in the
+     <link linkend="guc-primary-conninfo"><varname>primary_conninfo</varname></link>
+     string, which is used for slot synchronization and is ignored
for streaming.
+    </para>

IMO we don't need to repeat that last part ", which is used for slot
synchronization and is ignored for streaming." because that is a
detail about the primary_conninfo GUC, and the same information is
already described in that GUC section.

======

2. ALTER_REPLICATION_SLOT slot_name ( option [, ...] ) #

          <para>
-          If true, the slot is enabled to be synced to the standbys.
+          If true, the slot is enabled to be synced to the standbys
+          so that logical replication can be resumed after failover.
          </para>

This also should have the sentence "The default is false.", e.g. the
same as the same option in CREATE_REPLICATION_SLOT says.

======
synchronize_one_slot

3.
+ /*
+ * Make sure that concerned WAL is received and flushed before syncing
+ * slot to target lsn received from the primary server.
+ *
+ * This check will never pass if on the primary server, user has
+ * configured standby_slot_names GUC correctly, otherwise this can hit
+ * frequently.
+ */
+ latestFlushPtr = GetStandbyFlushRecPtr(NULL);
+ if (remote_slot->confirmed_lsn > latestFlushPtr)

BEFORE
This check will never pass if on the primary server, user has
configured standby_slot_names GUC correctly, otherwise this can hit
frequently.

SUGGESTION (simpler way to say the same thing?)
This will always be the case unless the standby_slot_names GUC is not
correctly configured on the primary server.

~~~

4.
+ /* User created slot with the same name exists, raise ERROR. */

/User created/User-created/

~~~

5. synchronize_slots, and also drop_obsolete_slots

+ /*
+ * Use shared lock to prevent a conflict with
+ * ReplicationSlotsDropDBSlots(), trying to drop the same slot while
+ * drop-database operation.
+ */

(same code comment is in a couple of places)

SUGGESTION (while -> during, etc.)

Use a shared lock to prevent conflicting with
ReplicationSlotsDropDBSlots() trying to drop the same slot during a
drop-database operation.

~~~

6. validate_parameters_and_get_dbname

strcmp() just for the empty string "" might be overkill.

6a.
+ if (PrimarySlotName == NULL || strcmp(PrimarySlotName, "") == 0)

SUGGESTION
if (PrimarySlotName == NULL || *PrimarySlotName == '\0')

~~

6b.
+ if (PrimaryConnInfo == NULL || strcmp(PrimaryConnInfo, "") == 0)

SUGGESTION
if (PrimaryConnInfo == NULL || *PrimaryConnInfo == '\0')

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

01 February 2024, 09:05:15

On Wed, Jan 31, 2024 at 9:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jan 31, 2024 at 7:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jan 31, 2024 at 2:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > Thank you for updating the patches. As for the slotsync worker patch,
> > > is there any reason why 0001, 0002, and 0004 patches are still
> > > separated?
> > >
> >
> > No specific reason, it could be easier to review those parts.
>
> Okay, I think we can merge 0001 and 0002 at least as we don't need
> bgworker codes.
>

Agreed, and I am fine with merging 0001, 0002, and 0004 as suggested
by you though I have a few minor comments on 0002 and 0004. I was
thinking about what will be a logical way to split the slot sync
worker patch (combined result of 0001, 0002, and 0004), and one idea
occurred to me is that we can have the first patch as
synchronize_solts() API and the functionality required to implement
that API then the second patch would be a slot sync worker which uses
that API to synchronize slots and does all the required validations.
Any thoughts?

Few minor comments on 0002 and 0004
================================
1. The comments above HandleChildCrash() should mention about slot sync worker

2.
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -42,6 +42,7 @@
 #include "replication/slot.h"
 #include "replication/syncrep.h"
 #include "replication/walsender.h"
+#include "replication/logicalworker.h"

...
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -43,6 +43,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/postmaster.h"
 #include "replication/slot.h"
+#include "replication/logicalworker.h"

These new includes don't appear to be in alphabetical order.

3.
+ /* We can not have logical without replication */
+ if (!replication)
+ Assert(!logical);

I think we can cover both these conditions via Assert

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

01 February 2024, 10:42:43

On Thu, Jan 25, 2024 at 11:26 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Wed, Jan 24, 2024 at 04:09:15PM +0530, shveta malik wrote:
> > On Wed, Jan 24, 2024 at 2:38 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > I also see Sawada-San's point and I'd vote for "sync_replication_slots". Then for
> > > the current feature I think "failover" and "on" should be the values to turn the
> > > feature on (assuming "on" would mean "all kind of supported slots").
> >
> > Even if others agree and we change this GUC name to
> > "sync_replication_slots", I feel we should keep the values as "on" and
> > "off" currently, where "on" would mean 'sync failover slots' (docs can
> > state that clearly).
>
> I gave more thoughts on it and I think the values should only be "failover" or
> "off".
>
> The reason is that if we allow "on" and change the "on" behavior in future
> versions (to support more than failover slots) then that would change the behavior
> for the ones that used "on".
>

I again thought on this point and feel that even if we start to sync
say physical slots their purpose would also be to allow
failover/switchover, otherwise, there is no use of syncing the slots.
So, by that theory, we can just go for naming it as
sync_failover_slots or simply sync_slots with values 'off' and 'on'.
Now, if these are used for switchover then there is an argument that
adding 'failover' in the GUC name could be confusing but I feel
'failover' is used widely enough that it shouldn't be a problem for
users to understand, otherwise, we can go with simple name like
sync_slots as well.

Thoughts?

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

01 February 2024, 11:23:20

On Wed, Jan 31, 2024 at 10:40 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Jan 31, 2024 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >
> > > I think is correct to say all those *other* properties (create_slot,
> > > enabled, copy_data) are forced to false because those otherwise have
> > > default true values.
> > >
> >
> > So, won't when connect=false, the user has to explicitly provide such
> > values (create_slot, enabled, etc.) as false? If so, is using 'force'
> > strictly correct?
>
> Perhaps the original docs text could be worded differently; I think
> the word "force" here just meant setting connection=false
> forces/causes/makes those other options behave "as if" they had been
> set to false without the user explicitly doing anything to them.
>

Okay, I see your point. Let's remove the 'failover' from this part of
the sentence.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

01 February 2024, 11:59:15

On Thu, Feb 1, 2024 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Agreed, and I am fine with merging 0001, 0002, and 0004 as suggested
> by you though I have a few minor comments on 0002 and 0004. I was
> thinking about what will be a logical way to split the slot sync
> worker patch (combined result of 0001, 0002, and 0004), and one idea
> occurred to me is that we can have the first patch as
> synchronize_solts() API and the functionality required to implement
> that API then the second patch would be a slot sync worker which uses
> that API to synchronize slots and does all the required validations.
> Any thoughts?

If we shift 'synchronize_slots()' to the first patch but there is no
caller of it, we may have a compiler warning for the same. The only
way it can be done is if we temporarily add SQL function on standby
which uses 'synchronize_slots()'. This SQL function can then be
removed in later patches where we actually have a caller for
'synchronize_slots'.

For the time being, I have merged 1,2, and some parts of 4 into a
single patch and separated out libpqrc related changes to the first
patch.

Attached v75 patch-set. Changes are:

1) Re-arranged the patches:
1.1) 'libpqrc' related changes (from v74-001 and v74-004) are
separated out in v75-001 as those are independent changes.
1.2) 'Add logical slot sync capability', 'Slot sync worker as special
process' and 'App-name changes' are now merged to single patch which
makes v75-002.
1.3) 'Wait for physical Standby confirmation' and 'Failover Validation
Document' patches are maintained as is (v75-003 and v75-004 now).

2) Addressed comments by Swada-San, Peter and Amit given in [1], [2],
[3] and [4]

[1]: https://www.postgresql.org/message-id/CAD21AoDUfnnxP%2By2cg%3DLhP-bQXqFE1z4US-no%3Du30J7X%3D4Z6Aw%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAD21AoAv6FwZ6UPNTj6%3D7A%2B3O2m4utzfL8ZGS6X1EGexikG66A%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAHut%2BPuDUT7X7ieB9uQE%3DCLznaVVcQDO2GexkHe1Xfw%3DSWnkPA%40mail.gmail.com
[4]: https://www.postgresql.org/message-id/CAA4eK1K7hLU2ZT1VX2k3e21c%3DkOZySZqfVDJsfE9vAS2AZ0mig%40mail.gmail.com

thanks
Shveta

On Fri, Feb 2, 2024 at 1:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> +1 for sync_replication_slots with values as 'on'/'off'.

Okay. PFA v76 which changes this GUC name as suggested. It also
addressed comments from Peter given in in [1] and [2].

[1]: https://www.postgresql.org/message-id/CAHut%2BPvFj8ZOx8-YdMWBS9vxMcmgxwOcA%2BYuJVgrayjhsiszHQ%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAHut%2BPtRJ_4x3re0bPn791PTL6kc2TRm1A2EPY1kjTCax_9F%3DA%40mail.gmail.com

thanks
Shveta

Attachment

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

02 February 2024, 12:18:39

On Fri, Feb 2, 2024 at 12:25 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some review comments for v750002.

Thanks for the feedback Peter. Addressed all in v76 except one.

> (this is a WIP but this is what I found so far...)

> I wonder if it is better to log all the problems in one go instead of
> making users stumble onto them one at a time after fixing one and then
> hitting the next problem. e.g. just set some variable "all_ok =
> false;" each time instead of all the "return false;"
>
> Then at the end of the function just "return all_ok;"

If we do this way, then we need to find a way to combine the msgs as
well, otherwise the same msg will be repeated multiple times. For the
concerned functionality (which needs one time config effort by user),
I feel the existing way looks okay. We may consider optimizing it if
we get more comments here.

thanks
Shveta

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

05 February 2024, 02:16:54

On Friday, February 2, 2024 2:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Thu, Feb 1, 2024 at 5:29 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Feb 1, 2024 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> > > Agreed, and I am fine with merging 0001, 0002, and 0004 as suggested
> > > by you though I have a few minor comments on 0002 and 0004. I was
> > > thinking about what will be a logical way to split the slot sync
> > > worker patch (combined result of 0001, 0002, and 0004), and one idea
> > > occurred to me is that we can have the first patch as
> > > synchronize_solts() API and the functionality required to implement
> > > that API then the second patch would be a slot sync worker which
> > > uses that API to synchronize slots and does all the required validations.
> > > Any thoughts?
> >
> > If we shift 'synchronize_slots()' to the first patch but there is no
> > caller of it, we may have a compiler warning for the same. The only
> > way it can be done is if we temporarily add SQL function on standby
> > which uses 'synchronize_slots()'. This SQL function can then be
> > removed in later patches where we actually have a caller for
> > 'synchronize_slots'.
> >
> 
> Can such a SQL function say pg_synchronize_slots() which can sync all slots that
> have a failover flag set be useful in general apart from just writing tests for this
> new API? I am thinking maybe users want more control over when to sync the
> slots and write their bgworker or simply do it just before shutdown once (sort
> of planned switchover) or at some other pre-defined times. BTW, we also have
> pg_log_standby_snapshot() which otherwise would be done periodically by
> background processes.

Here is an attempt for this. The slotsync worker patch is now splitted into
two patches(0002 and 0003). I also adjusted the doc, comments and tests for the
new pg_synchronize_slots() function.

Best Regards,
Hou zj

Attachment

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

05 February 2024, 02:29:40

On Monday, February 5, 2024 10:17 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> On Friday, February 2, 2024 2:56 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Thu, Feb 1, 2024 at 5:29 PM shveta malik <shveta.malik@gmail.com>
> wrote:
> > >
> > > On Thu, Feb 1, 2024 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > > >
> > > > Agreed, and I am fine with merging 0001, 0002, and 0004 as
> > > > suggested by you though I have a few minor comments on 0002 and
> > > > 0004. I was thinking about what will be a logical way to split the
> > > > slot sync worker patch (combined result of 0001, 0002, and 0004),
> > > > and one idea occurred to me is that we can have the first patch as
> > > > synchronize_solts() API and the functionality required to
> > > > implement that API then the second patch would be a slot sync
> > > > worker which uses that API to synchronize slots and does all the required
> validations.
> > > > Any thoughts?
> > >
> > > If we shift 'synchronize_slots()' to the first patch but there is no
> > > caller of it, we may have a compiler warning for the same. The only
> > > way it can be done is if we temporarily add SQL function on standby
> > > which uses 'synchronize_slots()'. This SQL function can then be
> > > removed in later patches where we actually have a caller for
> > > 'synchronize_slots'.
> > >
> >
> > Can such a SQL function say pg_synchronize_slots() which can sync all
> > slots that have a failover flag set be useful in general apart from
> > just writing tests for this new API? I am thinking maybe users want
> > more control over when to sync the slots and write their bgworker or
> > simply do it just before shutdown once (sort of planned switchover) or
> > at some other pre-defined times. BTW, we also have
> > pg_log_standby_snapshot() which otherwise would be done periodically
> > by background processes.
> 
> Here is an attempt for this. The slotsync worker patch is now splitted into two
> patches(0002 and 0003). I also adjusted the doc, comments and tests for the
> new pg_synchronize_slots() function.

There was one miss in the doc that cause CFbot failure,
attach the correct version V77_2 here. There are no code changes compared to V77 version.

Best Regards,
Hou zj

Attachment

Re: Synchronizing slots from primary to standby

From

Peter Smith

Date:

05 February 2024, 03:36:45

On Fri, Feb 2, 2024 at 11:18 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Fri, Feb 2, 2024 at 12:25 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Here are some review comments for v750002.
>
> Thanks for the feedback Peter. Addressed all in v76 except one.
>
> > (this is a WIP but this is what I found so far...)
>
> > I wonder if it is better to log all the problems in one go instead of
> > making users stumble onto them one at a time after fixing one and then
> > hitting the next problem. e.g. just set some variable "all_ok =
> > false;" each time instead of all the "return false;"
> >
> > Then at the end of the function just "return all_ok;"
>
> If we do this way, then we need to find a way to combine the msgs as
> well, otherwise the same msg will be repeated multiple times. For the
> concerned functionality (which needs one time config effort by user),
> I feel the existing way looks okay. We may consider optimizing it if
> we get more comments here.
>

I don't think combining messages is necessary;   I considered these
all as different (not the same msg repeated multiple times) since they
all have different errhints.

I felt a user would only know to make a configuration correction when
they are informed something is wrong, so my review point was we could
tell them all the wrong things up-front so then those can all be fixed
with a "one time config effort by user".

Otherwise, if multiple settings (e.g. from the list below) have wrong
values, I imagined the user will fix the first reported one, then the
next bad config will be reported, then the user will fix that one,
then the next bad config will be reported, then the user will fix that
one, and so on. It just seemed potentially/unnecessarilly painful.

- errhint("\"%s\" must be defined.", "primary_slot_name"));
- errhint("\"%s\" must be enabled.", "hot_standby_feedback"));
- errhint("\"wal_level\" must be >= logical."));
- errhint("\"%s\" must be defined.", "primary_conninfo"));
- errhint("'dbname' must be specified in \"%s\".", "primary_conninfo"));

~

Anyway, I just wanted to explain my review comment some more because
maybe my reason wasn't clear the first time. Whatever your decision
is, it is fine by me.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

05 February 2024, 03:49:23

On Thursday, February 1, 2024 12:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Thu, Feb 1, 2024 at 8:15 AM Euler Taveira <euler@eulerto.com> wrote:
> >
> > On Mon, Jan 29, 2024, at 10:17 AM, Zhijie Hou (Fujitsu) wrote:
> >
> > Attach the V72-0001 which addressed above comments, other patches will
> be
> > rebased and posted after pushing first patch. Thanks Shveta for helping
> address
> > the comments.
> >
> >
> > While working on another patch I noticed a new NOTICE message:
> >
> > NOTICE:  changed the failover state of replication slot "foo" on publisher to
> false
> >
> > I wasn't paying much attention to this thread then I start reading the 2
> > patches that was recently committed. The message above surprises me
> because
> > pg_createsubscriber starts to emit this message. The reason is that it doesn't
> > create the replication slot during the CREATE SUBSCRIPTION. Instead, it
> creates
> > the replication slot with failover = false and no such option is informed
> > during CREATE SUBSCRIPTION which means it uses the default value (failover
> =
> > false). I expect that I don't see any message because it is *not* changing the
> > behavior. I was wrong. It doesn't check the failover state on publisher, it
> > just executes walrcv_alter_slot() and emits a message.
> >
> > IMO if we are changing an outstanding property on node A from node B,
> node B
> > already knows (or might know) about that behavior change (because it is
> sending
> > the command), however, node A doesn't (unless log_replication_commands
> = on --
> > it is not the default).
> >
> > Do we really need this message as NOTICE?
> >
> 
> The reason for adding this NOTICE was to keep it similar to other
> Notice messages in these commands like create/drop slot. However, here
> the difference is we may not have altered the slot as the property is
> already the same as we want to set on the publisher. So, I am not sure
> whether we should follow the existing behavior or just get rid of it.
> And then do we remove similar NOTICE in AlterSubscription() as well?
> Normally, I think NOTICE intends to let users know if we did anything
> with slots while executing subscription commands. Does anyone else
> have an opinion on this point?
> 
> A related point, I think we can avoid setting the 'failover' property
> in ReplicationSlotAlter() if it is not changed, the advantage is we
> will avoid saving slots. OTOH, this won't be a frequent operation so
> we can leave it as it is as well.

Here is a patch to remove the NOTICE and improve the ReplicationSlotAlter.
The patch also includes few cleanups based on Peter's feedback.

Best Regards,
Hou zj

Attachment

v2-0001-clean-up-for-776621a5.patch

Re: Synchronizing slots from primary to standby

From

Ajin Cherian

Date:

05 February 2024, 05:27:46

On Mon, Feb 5, 2024 at 1:29 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:

On Monday, February 5, 2024 10:17 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:

There was one miss in the doc that cause CFbot failure,
attach the correct version V77_2 here. There are no code changes compared to V77 version.

Best Regards,
Hou zj

Just noticed that doc/src/sgml/config.sgml still refers to enable_synclot instead of sync_replication_slots:

The standbys corresponding to the physical replication slots in
<varname>standby_slot_names</varname> must configure
<literal>enable_syncslot = true</literal> so they can receive
failover logical slots changes from the primary.

regards,

Ajin Cherian

Fujitsu Australia

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

05 February 2024, 11:06:26

On Mon, Feb 5, 2024 at 7:59 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>

I have pushed the first patch. Next, a few comments on 0002 are as follows:
1.
+static bool
+validate_parameters_and_get_dbname(char **dbname, int elevel)

For 0002, we don't need dbname as out parameter. Also, we can rename
the function to validate_slotsync_params() or something like that.
Also, for 0003, we don't need to get the dbname from
wait_for_valid_params_and_get_dbname(), instead there could be a
common function that can be invoked from validate_slotsync_params()
and caller of wait function that caches the value of dbname.

The other parameter elevel is also not required for 0002.

2.
+ /*
+ * Make sure that concerned WAL is received and flushed before syncing
+ * slot to target lsn received from the primary server.
+ */
+ latestFlushPtr = GetStandbyFlushRecPtr(NULL);
+ if (remote_slot->confirmed_lsn > latestFlushPtr)
+ {
+ /*
+ * Can get here only if GUC 'standby_slot_names' on the primary server
+ * was not configured correctly.
+ */
+ ereport(LOG,
+ errmsg("skipping slot synchronization as the received slot sync"
+    " LSN %X/%X for slot \"%s\" is ahead of the standby position %X/%X",
+    LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
+    remote_slot->name,
+    LSN_FORMAT_ARGS(latestFlushPtr)));
+
+ return false;

In the case of a function invocation, this should be an ERROR. We can
move the comment related to 'standby_slot_names' to a later patch
where that GUC is introduced. See, if there are other LOGs in the
patch that needs to be converted to ERROR.

3. The function pg_sync_replication_slots() should be in file
slotfuncs.c and common functionality between this function and
slotsync worker can be exposed via a function in slotsync.c.

4.
/*
+ * Using the specified primary server connection, check whether we are
+ * cascading standby and validates primary_slot_name for
+ * non-cascading-standbys.
+ */
+ check_primary_info(wrconn, &am_cascading_standby,
+    &primary_slot_invalid, ERROR);
+
+ if (am_cascading_standby)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot synchronize replication slots to a cascading standby"));

primary_slot_invalid is not used in this patch. I think we can allow
the function can be executed on cascading_standby as well because this
will be used for the planned switchover.

5. I don't see any problem with allowing concurrent processes trying
to sync the same slot at the same time as each process will acquire
the slot and only one process can acquire the slot at a time, the
other will get an ERROR.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

05 February 2024, 11:26:16

On Mon, Feb 5, 2024 at 10:57 AM Ajin Cherian <itsajin@gmail.com> wrote:
>
> Just noticed that doc/src/sgml/config.sgml still refers to enable_synclot instead of sync_replication_slots:
>
> The standbys corresponding to the physical replication slots in
> <varname>standby_slot_names</varname> must configure
> <literal>enable_syncslot = true</literal> so they can receive
>  failover logical slots changes from the primary.

Thanks Ajin for pointing this out. Here are v78 patches, corrected there.

Other changes are:

1)  Rebased the patches as the v77-001 is now pushed.
2)  Enabled executing pg_sync_replication_slots() on cascading-standby.
3)  Rearranged the code around parameter validity checks. Changed
function names and changed the way how dbname is extracted as
suggested by Amit offlist.
4)  Rearranged the code around check_primary_info(). Removed output args.
5)  Few other trivial changes.

thanks
Shveta

On Tuesday, February 6, 2024 3:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Feb 5, 2024 at 7:56 PM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> > >
> > > ---
> > > Since Two processes (e.g. the slotsync worker and
> > > pg_sync_replication_slots()) concurrently fetch and update the slot
> > > information, there is a race condition where slot's
> > > confirmed_flush_lsn goes backward.
> > >
> >
> > Right, this is possible, though there shouldn't be a problem because
> > anyway, slotsync is an async process. Till we hold restart_lsn, the
> > required WAL won't be removed. Having said that, I can think of two
> > ways to avoid it: (a) We can have some flag in shared memory using
> > which we can detect whether any other process is doing slot
> > syncronization and then either error out at that time or simply wait
> > or may take nowait kind of parameter from user to decide what to do?
> > If this is feasible, we can simply error out for the first version and
> > extend it later if we see any use cases for the same (b) similar to
> > restart_lsn, if confirmed_flush_lsn is getting moved back, raise an
> > error, this is good for now but in future we may still have another
> > similar issue, so I would prefer (a) among these but I am fine if you
> > prefer (b) or have some other ideas like just note down in comments
> > that this is a harmless case and can happen only very rarely.
> 
> Thank you for sharing the ideas. I would prefer (a). For (b), the same issue still
> happens for other fields.

Attach the V79 patch which includes the following changes. (Note that only
0001 is sent in this version, we will send the later patches after rebasing)

1. Address all the comments from Amit[1], all the comments from Peter[2] and some of
   the comments from Sawada-san[3].
2. Using a flag in shared to memory to restrcit concurrent slot sync.
3. Add more tap tests for pg_sync_replication_slots function.

[1] https://www.postgresql.org/message-id/CAA4eK1KGHT9S-Bst_G1CUNQvRep%3DipMs5aTBNRQFVi6TogbJ9w%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPtyoRf3adoLoTrbL6momzkhXAFKz656Vv9YRu4cp%3D6Yig%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAD21AoCEkcTaPb%2BGdOhSQE49_mKJG6D64quHcioJGx6RCqMv%2BQ%40mail.gmail.com

Best Regards,
Hou zj

Attachment

v79-0001-Add-a-slot-synchronization-function.patch

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

06 February 2024, 13:51:01

On Monday, February 5, 2024 10:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
lvh.no-ip.org>
> Subject: Re: Synchronizing slots from primary to standby
> 
> On Mon, Feb 5, 2024 at 8:26 PM shveta malik <shveta.malik@gmail.com>
> wrote:
> >
> > On Mon, Feb 5, 2024 at 10:57 AM Ajin Cherian <itsajin@gmail.com> wrote:
> > >
> > > Just noticed that doc/src/sgml/config.sgml still refers to enable_synclot
> instead of sync_replication_slots:
> > >
> > > The standbys corresponding to the physical replication slots in
> > > <varname>standby_slot_names</varname> must configure
> > > <literal>enable_syncslot = true</literal> so they can receive
> > > failover logical slots changes from the primary.
> >
> > Thanks Ajin for pointing this out. Here are v78 patches, corrected there.
> >
> > Other changes are:
> >
> > 1)  Rebased the patches as the v77-001 is now pushed.
> > 2)  Enabled executing pg_sync_replication_slots() on cascading-standby.
> > 3)  Rearranged the code around parameter validity checks. Changed
> > function names and changed the way how dbname is extracted as
> > suggested by Amit offlist.
> > 4)  Rearranged the code around check_primary_info(). Removed output
> args.
> > 5)  Few other trivial changes.
> >
> 
> Thank you for updating the patch! Here are some comments:
> 
> ---
> Since Two processes (e.g. the slotsync worker and
> pg_sync_replication_slots()) concurrently fetch and update the slot information,
> there is a race condition where slot's confirmed_flush_lsn goes backward. . We
> have the following check but it doesn't prevent the slot's confirmed_flush_lsn
> from moving backward if the restart_lsn does't change:
> 
>             /*
>              * Sanity check: As long as the invalidations are handled
>              * appropriately as above, this should never happen.
>              */
>             if (remote_slot->restart_lsn < slot->data.restart_lsn)
>                 elog(ERROR,
>                      "cannot synchronize local slot \"%s\" LSN(%X/%X)"
>                      " to remote slot's LSN(%X/%X) as synchronization"
>                      " would move it backwards", remote_slot->name,
>                      LSN_FORMAT_ARGS(slot->data.restart_lsn),
>                      LSN_FORMAT_ARGS(remote_slot->restart_lsn));
> 

As discussed, I added a flag in shared memory to control the concurrent slot sync.

> ---
> +     It is recommended that subscriptions are first disabled before
> + promoting
> f+     the standby and are enabled back after altering the connection string.
> 
> I think it's better to describe the reason why it's recommended to disable
> subscriptions before the standby promotion.

Added.

> 
> ---
> +/* Slot sync worker objects */
> +extern PGDLLIMPORT char *PrimaryConnInfo; extern PGDLLIMPORT char
> +*PrimarySlotName;
> 
> These two variables are declared also in xlogrecovery.h. Is it intentional? If so, I
> think it's better to write comments.

Will address.

> 
> ---
> Global functions and variables used by the slotsync worker are declared in
> logicalworker.h and worker_internal.h. But is it really okay to make a
> dependency between the slotsync worker and logical replication workers? IIUC
> the slotsync worker is conceptually a separate feature from the logical
> replication. I think the slotsync worker can have its own header file.

Added.

> 
> ---
> +               SELECT r.srsubid AS subid, CONCAT('pg_' || srsubid ||
> '_sync_' || srrelid || '_' || ctl.system_identifier) AS slotname
> 
> and
> 
> +               SELECT (CASE WHEN r.srsubstate = 'f' THEN
> pg_replication_origin_progress(CONCAT('pg_' || r.srsubid || '_' || r.srrelid), false)
> 
> If we use CONCAT function, we can replace '||' with ','.
> 

Will address.

> ---
> +     Confirm that the standby server is not lagging behind the subscribers.
> +     This step can be skipped if
> +     <link
> linkend="guc-standby-slot-names"><varname>standby_slot_names</varna
> me></link>
> +     has been correctly configured.
> 
> How can the user confirm if standby_slot_names is correctly configured?

Will address after concluding.

Thanks Shveta for helping addressing the comments.

Best Regards,
Hou zj

Re: Synchronizing slots from primary to standby

From

Masahiko Sawada

Date:

07 February 2024, 01:13:20

On Tue, Feb 6, 2024 at 8:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Feb 6, 2024 at 3:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Feb 6, 2024 at 1:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Tue, Feb 6, 2024 at 3:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > I think users can refer to LOGs to see if it has changed since the
> > > > first time it was configured. I tried by existing parameter and see
> > > > the following in LOG:
> > > > LOG:  received SIGHUP, reloading configuration files
> > > > 2024-02-06 11:38:59.069 IST [9240] LOG:  parameter "autovacuum" changed to "on"
> > > >
> > > > If the user can't confirm then it is better to follow the steps
> > > > mentioned in the patch. Do you want something else to be written in
> > > > docs for this? If so, what?
> > >
> > > IIUC even if a wrong slot name is specified to standby_slot_names or
> > > even standby_slot_names is empty, the standby server might not be
> > > lagging behind the subscribers depending on the timing. But when
> > > checking it the next time, the standby server might lag behind the
> > > subscribers. So what I wanted to know is how the user can confirm if a
> > > failover-enabled subscription is ensured not to go in front of
> > > failover-candidate standbys (i.e., standbys using the slots listed in
> > > standby_slot_names).
> > >
> >
> > But isn't the same explained by two steps ((a) Firstly, on the
> > subscriber node check the last replayed WAL. (b) Next, on the standby
> > server check that the last-received WAL location is ahead of the
> > replayed WAL location on the subscriber identified above.) in the
> > latest *_0004 patch.
> >
>
> Additionally, I would like to add that the users can use the queries
> mentioned in the doc after the primary has failed and before promoting
> the standby. If she wants to do that when both primary and standby are
> available, the value of 'standby_slot_names' on primary should be
> referred. Isn't those two sufficient that there won't be false
> positives?

From a user perspective, I'd like to confirm the following two points :

1. replication slots used by subscribers are synchronized to the standby.
2. it's guaranteed that logical replication doesn't go ahead of
physical replication to the standby.

These checks are necessary at least when building a replication setup
(primary, standby, and subscriber). Otherwise, it's too late if we
find out that no standby is failover-ready when the primary fails and
we're about to do a failover.

As for the point 1 above, we can use the step 1 described in the doc.

As for point 2, the step 2 described in the doc could return true even
if standby_slot_names isn't working. For example, standby_slot_names
is empty, the user changed the standby_slot_names but forgot to reload
the config file, and the walsender doesn't reflect the
standby_slot_names update yet for some reason etc. It's possible that
standby's last-received WAL location just happens to be ahead of the
replayed WAL location on the subscriber. So even if the check query
returns true once, it could return false when we check it again, if
standby_slot_names is not working. On the other hand, IIUC if the
point 2 is ensured, the check query always returns true. I think it
would be good if we could provide a reliable way to check point 2
ideally via SQL queries (especially for tools).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Peter Smith

Date:

07 February 2024, 04:00:13

Here are some review comments for v79-0001

======
Commit message

1.
The logical replication slots on the primary can be synchronized to the hot
standby by enabling the "failover" parameter during
pg_create_logical_replication_slot() or by enabling "failover" option of the
CREATE SUBSCRIPTION command and calling pg_sync_replication_slots() function
on the standby.

~

SUGGESTION
The logical replication slots on the primary can be synchronized to
the hot standby by enabling failover during slot creation (e.g. using
the "failover" parameter of pg_create_logical_replication_slot(), or
using the "failover" option of the CREATE SUBSCRIPTION command), and
then calling pg_sync_replication_slots() function on the standby.

======

2.
+       <caution>
+        <para>
+          If after executing the function, hot_standby_feedback is disabled on
+          the standby or the physical slot configured in primary_slot_name is
+          removed, then it is possible that the necessary rows of the
+          synchronized slot will be removed by the VACUUM process on
the primary
+          server, resulting in the synchronized slot becoming invalidated.
+        </para>
+       </caution>

2a.
/If after/If, after/

~

2b.
Use SGML <variable> for the GUC names (hot_standby_feedback, and
primary_slot_name), and consider putting links for them as well.

======
src/sgml/logicaldecoding.sgml


3.
+   <sect2 id="logicaldecoding-replication-slots-synchronization">
+    <title>Replication Slot Synchronization</title>
+    <para>
+     A logical replication slot on the primary can be synchronized to the hot
+     standby by enabling the <literal>failover</literal> option for the slot
+     and calling <function>pg_sync_replication_slots</function>
+     on the standby. The <literal>failover</literal> option of the slot
+     can be enabled either by enabling
+     <link linkend="sql-createsubscription-params-with-failover">
+     <literal>failover</literal></link>
+     option during subscription creation or by providing
<literal>failover</literal>
+     parameter during
+     <link linkend="pg-create-logical-replication-slot">
+     <function>pg_create_logical_replication_slot</function></link>.

IMO it will be better to slightly reword this (like was suggested for
the Commit Message). I felt it is also better to refer/link to "CREATE
SUBSCRIPTION" instead of saying "during subscription creation".

SUGGESTION
The logical replication slots on the primary can be synchronized to
the hot standby by enabling failover during slot creation (e.g. using
the "failover" parameter of pg_create_logical_replication_slot, or
using the "failover" option of the CREATE SUBSCRIPTION command), and
then calling pg_sync_replication_slots() function on the standby.

~~~

4.
+     There are chances that the old primary is up again during the promotion
+     and if subscriptions are not disabled, the logical subscribers may keep
+     on receiving the data from the old primary server even after promotion
+     until the connection string is altered. This may result in the data
+     inconsistency issues and thus the logical subscribers may not be able
+     to continue the replication from the new primary server.
+    </para>

4a.
/There are chances/There is a chance/

/may keep on receiving the data/may continue to receive data/

~

4b.
BEFORE
This may result in the data inconsistency issues and thus the logical
subscribers may not be able to continue the replication from the new
primary server.

SUGGESTION
This might result in data inconsistency issues, preventing the logical
subscribers from being able to continue replication from the new
primary server.

~

4c.
I felt this whole part "There is a chance..." should be rendered as a
<note> or a <caution> or something.

======
src/backend/replication/logical/slotsync.c

5.
+/*
+ * Return true if all necessary GUCs for slot synchronization are set
+ * appropriately, otherwise return false.
+ */
+bool
+ValidateSlotSyncParams(void)
+{
+ char    *dbname;
+
+ /*
+ * A physical replication slot(primary_slot_name) is required on the
+ * primary to ensure that the rows needed by the standby are not removed
+ * after restarting, so that the synchronized slot on the standby will not
+ * be invalidated.
+ */
+ if (PrimarySlotName == NULL || *PrimarySlotName == '\0')
+ {
+ ereport(ERROR,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be defined.", "primary_slot_name"));
+ return false;
+ }
+
+ /*
+ * hot_standby_feedback must be enabled to cooperate with the physical
+ * replication slot, which allows informing the primary about the xmin and
+ * catalog_xmin values on the standby.
+ */
+ if (!hot_standby_feedback)
+ {
+ ereport(ERROR,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be enabled.", "hot_standby_feedback"));
+ return false;
+ }
+
+ /*
+ * Logical decoding requires wal_level >= logical and we currently only
+ * synchronize logical slots.
+ */
+ if (wal_level < WAL_LEVEL_LOGICAL)
+ {
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"wal_level\" must be >= logical."));
+ return false;
+ }
+
+ /*
+ * The primary_conninfo is required to make connection to primary for
+ * getting slots information.
+ */
+ if (PrimaryConnInfo == NULL || *PrimaryConnInfo == '\0')
+ {
+ ereport(ERROR,
+ /* translator: %s is a GUC variable name */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("\"%s\" must be defined.", "primary_conninfo"));
+ return false;
+ }
+
+ /*
+ * The slot synchronization needs a database connection for walrcv_exec to
+ * work.
+ */
+ dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo);
+ if (dbname == NULL)
+ {
+ ereport(ERROR,
+
+ /*
+ * translator: 'dbname' is a specific option; %s is a GUC variable
+ * name
+ */
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("bad configuration for slot synchronization"),
+ errhint("'dbname' must be specified in \"%s\".", "primary_conninfo"));
+ return false;
+ }
+
+ return true;
+}

The code of this function has been flip-flopping between versions.
Now, it is always giving an ERROR when something is wrong, so all of
the "return false" are unreachable. It also means the function comment
is wrong, and the boolean return is unused/unnecessary.

~~~

6. SlotSyncShmemInit

+/*
+ * Allocate and initialize slot sync shared memory.
+ */

This comment should use the same style wording as the other nearby
shmem function comments.

SUGGESTION
Allocate and initialize the shared memory of slot synchronization.

~~~

7.
+/*
+ * Cleanup the shared memory of slot synchronization.
+ */
+static void
+SlotSyncShmemExit(int code, Datum arg)

Since this is static, should it use the snake case naming convention?
-- e.g. slot_sync_shmem_exit.

~~~

8.
+/*
+ * Register the callback function to clean up the shared memory of slot
+ * synchronization.
+ */
+void
+SlotSyncInitialize(void)
+{
+ before_shmem_exit(SlotSyncShmemExit, 0);
+}

This is only doing registration for cleanup of shmem stuff. So, does
it really need it to be a separate function, or can this be registered
within SlotSyncShmemInit() itself?

~~~

9. SyncReplicationSlots

+ PG_TRY();
+ {
+ validate_primary_slot_name(wrconn);
+
+ (void) synchronize_slots(wrconn);
+ }
+ PG_FINALLY();
+ {
+ if (syncing_slots)
+ {
+ SpinLockAcquire(&SlotSyncCtx->mutex);
+ SlotSyncCtx->syncing = false;
+ SpinLockRelease(&SlotSyncCtx->mutex);
+
+ syncing_slots = false;
+ }
+
+ walrcv_disconnect(wrconn);
+ }
+ PG_END_TRY();

IIUC, the "if (syncing_slots)" part is not really for normal
operation, but it is a safe-guard for cleaning up if some unexpected
ERROR happens. Maybe there should be a comment to say that.

======
src/test/recovery/t/040_standby_failover_slots_sync.pl

10.
+# Confirm that the logical failover slot is created on the standby and is
+# flagged as 'synced'
+is($standby1->safe_psql('postgres',
+ q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN
('lsub1_slot', 'lsub2_slot') AND synced;}),
+ "t",
+ 'logical slots have synced as true on standby');

/slot is created/slots are created/

/and is flagged/and are flagged/

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

07 February 2024, 05:29:48

On Tue, Feb 6, 2024 at 7:19 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Attach the V79 patch which includes the following changes. (Note that only
> 0001 is sent in this version, we will send the later patches after rebasing)

Thanks Hou-San. Please find the rebased patches. There was a conflict
after the recent merge, so rebased patch001.

The patch002 and patch004 address a few of Swada-san's pending
comments. No change in patch003 except rebasing.

thanks
Shveta

On Wed, Feb 7, 2024 at 9:30 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some review comments for v79-0001

Thanks for the feedback. Addressed the comments in v80 patch-set.
Please find my response inline for few.

> src/sgml/logicaldecoding.sgml
> 3.
> +   <sect2 id="logicaldecoding-replication-slots-synchronization">
> +    <title>Replication Slot Synchronization</title>
> +    <para>
> +     A logical replication slot on the primary can be synchronized to the hot
> +     standby by enabling the <literal>failover</literal> option for the slot
> +     and calling <function>pg_sync_replication_slots</function>
> +     on the standby. The <literal>failover</literal> option of the slot
> +     can be enabled either by enabling
> +     <link linkend="sql-createsubscription-params-with-failover">
> +     <literal>failover</literal></link>
> +     option during subscription creation or by providing
> <literal>failover</literal>
> +     parameter during
> +     <link linkend="pg-create-logical-replication-slot">
> +     <function>pg_create_logical_replication_slot</function></link>.
>
> IMO it will be better to slightly reword this (like was suggested for
> the Commit Message). I felt it is also better to refer/link to "CREATE
> SUBSCRIPTION" instead of saying "during subscription creation".

Regarding link to create-sub, the
'sql-createsubscription-params-with-failover' takes you to the
failover property of Create-Subscription page. Won't that suffice?

>
> 8.
> +/*
> + * Register the callback function to clean up the shared memory of slot
> + * synchronization.
> + */
> +void
> +SlotSyncInitialize(void)
> +{
> + before_shmem_exit(SlotSyncShmemExit, 0);
> +}
>
> This is only doing registration for cleanup of shmem stuff. So, does
> it really need it to be a separate function, or can this be registered
> within SlotSyncShmemInit() itself?

I think it makes more sense to call it from BaseInit() where we have
all such calls like InitTemporaryFileAccess(),
ReplicationSlotInitialize() etc which do similar callback
registrations using before_shmem_exit().

Attached the patches for v80. Overall changes are:

--Addressed comments by Peter (which I responded above) and Amit given
in [1] and [2].
--Also improved commit msg and comment around 'wal_level' as suggested
by Bertrand in [3].

[1]: https://www.postgresql.org/message-id/CAHut%2BPvtysbVd8tj2AADk%3DeNo0VY9Ov9wkBP-K%2B9tj1wRS4M4w%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAA4eK1%2Bar0N1xXnZZ26BG1qO4LHRS8v3wnH9Pnz4BWmk6SDTHw%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/ZcHX4SXkqtGe27a6%40ip-10-97-1-34.eu-west-3.compute.internal

thanks
Shveta

On Wed, Feb 7, 2024 at 4:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> ...
> > drop_obsolete_slots                  drop_local_synced_slots
>
> The new name doesn't convey the intent of the function. If we want to
> have a difference based on remote/local slots then we can probably
> name it as drop_local_obsolete_slots.
>
> > reserve_wal_for_slot                 reserve_wal_for_local_slot
> > local_slot_update                      update_local_synced_slot
> > update_and_persist_slot           update_and_persist_local_synced_slot
> >
>
> The new names sound better in the above cases as the current names
> appear too generic.

Sure, made the suggested function name changes. Since there is no
other change, I kept the version as v80_2.

thanks
Shveta

On Thu, Feb 8, 2024 at 12:08 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some review comments for patch v80_2-0001.

Thanks for the feedback Peter. Addressed the comments in v81.
Attached patch001 for early feedback. Rest of the patches need
rebasing and thus will post those later.

It also addresses comments by Amit in [1].

[1]: https://www.postgresql.org/message-id/CAA4eK1Ldhh_kf-qG-m5BKY0R1SkdBSx5j%2BEzwpie%2BH9GPWWOYA%40mail.gmail.com

thanks
Shveta

Attachment

v81-0001-Add-a-slot-synchronization-function.patch

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

08 February 2024, 11:06:37

On Thu, Feb 8, 2024 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Few comments on 0001
> ===================

Thanks Amit. Addressed these in v81.

> 1.
> + * the slots on the standby and synchronize them. This is done on every call
> + * to SQL function pg_sync_replication_slots.
> >
>
> I think the second sentence can be slightly changed to: "This is done
> by a call to SQL function pg_sync_replication_slots." or "One can call
> SQL function pg_sync_replication_slots to invoke this functionality."

Done.

> 2.
> +update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
> {
> ...
> + SpinLockAcquire(&slot->mutex);
> + slot->data.plugin = plugin_name;
> + slot->data.database = remote_dbid;
> + slot->data.two_phase = remote_slot->two_phase;
> + slot->data.failover = remote_slot->failover;
> + slot->data.restart_lsn = remote_slot->restart_lsn;
> + slot->data.confirmed_flush = remote_slot->confirmed_lsn;
> + slot->data.catalog_xmin = remote_slot->catalog_xmin;
> + slot->effective_catalog_xmin = remote_slot->catalog_xmin;
> + SpinLockRelease(&slot->mutex);
> +
> + if (remote_slot->catalog_xmin != slot->data.catalog_xmin)
> + ReplicationSlotsComputeRequiredXmin(false);
> +
> + if (remote_slot->restart_lsn != slot->data.restart_lsn)
> + ReplicationSlotsComputeRequiredLSN();
> ...
> }
>
> How is it possible that after assigning the values from remote_slot
> they can differ from local slot values?

It was a mistake while comment fixing in previous versions. Corrected
it now. Thanks for catching.

> 3.
> + /*
> + * Find the oldest existing WAL segment file.
> + *
> + * Normally, we can determine it by using the last removed segment
> + * number. However, if no WAL segment files have been removed by a
> + * checkpoint since startup, we need to search for the oldest segment
> + * file currently existing in XLOGDIR.
> + */
> + oldest_segno = XLogGetLastRemovedSegno() + 1;
> +
> + if (oldest_segno == 1)
> + oldest_segno = XLogGetOldestSegno(0);
>
> I feel this way isn't there a risk that XLogGetOldestSegno() will get
> us the seg number from some previous timeline which won't make sense
> to compare segno in reserve_wal_for_local_slot. Shouldn't you need to
> fetch the current timeline and send as a parameter to this function as
> that is the timeline on which standby is communicating with primary.

Yes, modified it.

> 4.
> + if (remote_slot->confirmed_lsn > latestFlushPtr)
> + ereport(ERROR,
> + errmsg("skipping slot synchronization as the received slot sync"
>
> I think the internal errors should be reported with elog as you have
> done at other palces in the patch.

Done.

> 5.
> +synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
> {
> ...
> + /*
> + * Copy the invalidation cause from remote only if local slot is not
> + * invalidated locally, we don't want to overwrite existing one.
> + */
> + if (slot->data.invalidated == RS_INVAL_NONE)
> + {
> + SpinLockAcquire(&slot->mutex);
> + slot->data.invalidated = remote_slot->invalidated;
> + SpinLockRelease(&slot->mutex);
> +
> + /* Make sure the invalidated state persists across server restart */
> + ReplicationSlotMarkDirty();
> + ReplicationSlotSave();
> + slot_updated = true;
> + }
> ...
> }
>
> Do we need to copy the 'invalidated' from remote to local if both are
> same? I think this will happen for each slot each time because
> normally slots won't be invalidated ones, so there is needless writes.

It is not needed everytime. Optimized it. Now we copy only if
local_slot's 'invalidated' value is RS_INVAL_NONE while remote-slot's
value != RS_INVAL_NONE.

> 6.
> + * Returns TRUE if any of the slots gets updated in this sync-cycle.
> + */
> +static bool
> +synchronize_slots(WalReceiverConn *wrconn)
> ...
> ...
>
> +void
> +SyncReplicationSlots(WalReceiverConn *wrconn)
> +{
> + PG_TRY();
> + {
> + validate_primary_slot_name(wrconn);
> +
> + (void) synchronize_slots(wrconn);
>
> For the purpose of 0001, synchronize_slots() doesn't seems to use
> return value. So, I suggest to change it accordingly and move the
> return value in the required patch.

Modified it. Also changed return values of all related internal
functions which were returning slot_updated.

> 7.
> + /*
> + * The primary_slot_name is not set yet or WALs not received yet.
> + * Synchronization is not possible if the walreceiver is not started.
> + */
> + latestWalEnd = GetWalRcvLatestWalEnd();
> + SpinLockAcquire(&WalRcv->mutex);
> + if ((WalRcv->slotname[0] == '\0') ||
> + XLogRecPtrIsInvalid(latestWalEnd))
> + {
> + SpinLockRelease(&WalRcv->mutex);
> + return false;
>
> For the purpose of 0001, we should give WARNING here.

I will fix it in the next version. Sorry, I somehow missed it this time.

thanks
Shveta

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

08 February 2024, 11:08:04

On Thu, Feb 8, 2024 at 4:31 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Feb 8, 2024 at 12:08 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Here are some review comments for patch v80_2-0001.
>
> Thanks for the feedback Peter. Addressed the comments in v81.

Missed to mention, Hou-san helped in addressing some of these comments
in v81.Thanks Hou-San.

thanks
Shveta

Re: Synchronizing slots from primary to standby

From

Peter Smith

Date:

09 February 2024, 04:26:43

Here are some review comments for patch v81-0001.

======

1. GENERAL - ReplicationSlotInvalidationCause enum.

I was thinking that the ReplicationSlotInvalidationCause should
explicitly set RS_INVAL_NONE = 0 (it's zero anyway, but making it
explicit with a comment /* Must be zero. */. will stop it from being
changed in the future).

------
/*
 * Slots can be invalidated, e.g. due to max_slot_wal_keep_size. If so, the
 * 'invalidated' field is set to a value other than _NONE.
 */
typedef enum ReplicationSlotInvalidationCause
{
  RS_INVAL_NONE = 0, /* Must be zero. */
  ...
} ReplicationSlotInvalidationCause;
------

The reason to do this is because many places in the patch check for
RS_INVAL_NONE, but if RS_INVAL_NONE == 0 is assured, all those code
fragments can be simplified and IMO also become more readable.

e.g. update_local_synced_slot()

BEFORE
Assert(slot->data.invalidated == RS_INVAL_NONE);

AFTER
Assert(!slot->data.invalidated);

~

e.g. local_sync_slot_required()

BEFORE
locally_invalidated =
  (remote_slot->invalidated == RS_INVAL_NONE) &&
  (local_slot->data.invalidated != RS_INVAL_NONE);

AFTER
locally_invalidated = !remote_slot->invalidated && local_slot->data.invalidated;

~

e.g. synchronize_one_slot()

BEFORE
if (slot->data.invalidated == RS_INVAL_NONE &&
 remote_slot->invalidated != RS_INVAL_NONE)

AFTER
if (!slot->data.invalidated && remote_slot->invalidated;

BEFORE
/* Skip the sync of an invalidated slot */
if (slot->data.invalidated != RS_INVAL_NONE)

AFTER
/* Skip the sync of an invalidated slot */
if (slot->data.invalidated)

BEFORE
/* Skip creating the local slot if remote_slot is invalidated already */
if (remote_slot->invalidated != RS_INVAL_NONE)

AFTER
/* Skip creating the local slot if remote_slot is invalidated already */
if (remote_slot->invalidated)

~

e.g. synchronize_slots()

BEFORE
if ((XLogRecPtrIsInvalid(remote_slot->restart_lsn) ||
  XLogRecPtrIsInvalid(remote_slot->confirmed_lsn) ||
  !TransactionIdIsValid(remote_slot->catalog_xmin)) &&
  remote_slot->invalidated == RS_INVAL_NONE)

AFTER
if ((XLogRecPtrIsInvalid(remote_slot->restart_lsn) ||
  XLogRecPtrIsInvalid(remote_slot->confirmed_lsn) ||
  !TransactionIdIsValid(remote_slot->catalog_xmin)) &&
  !remote_slot->invalidated)

======
src/backend/replication/logical/slotsync.c

2. update_local_synced_slot

+ if (strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) == 0 &&
+ remote_dbid == slot->data.database &&
+ !xmin_changed && !restart_lsn_changed &&
+ remote_slot->two_phase == slot->data.two_phase &&
+ remote_slot->failover == slot->data.failover &&
+ remote_slot->confirmed_lsn == slot->data.confirmed_flush)
+ return false;

Consider rearranging the conditions to put the strcmp later -- e.g.
might as well avoid the (more expensive?) strcmp if some of those
boolean tests are already false.

~~~

3.
+ /*
+ * There is a possibility of parallel database drop by startup
+ * process and re-creation of new slot by user in the small window
+ * between getting the slot to drop and locking the db. This new
+ * user-created slot may end up using the same shared memory as
+ * that of 'local_slot'. Thus check if local_slot is still the
+ * synced one before performing actual drop.
+ */

BEFORE
There is a possibility of parallel database drop by startup process
and re-creation of new slot by user in the small window between
getting the slot to drop and locking the db.

SUGGESTION
In the small window between getting the slot to drop and locking the
database, there is a possibility of a parallel database drop by the
startup process or the creation of a new slot by the user.

~~~

4.
+/*
+ * Synchronize single slot to given position.
+ *
+ * This creates a new slot if there is no existing one and updates the
+ * metadata of the slot as per the data received from the primary server.
+ *
+ * The slot is created as a temporary slot and stays in the same
state until the
+ * the remote_slot catches up with locally reserved position and local slot is
+ * updated. The slot is then persisted and is considered as sync-ready for
+ * periodic syncs.
+ */

/Synchronize single slot to given position./Synchronize a single slot
to the given position./

~~~

5. synchronize_slots

+ /*
+ * The primary_slot_name is not set yet or WALs not received yet.
+ * Synchronization is not possible if the walreceiver is not started.
+ */
+ latestWalEnd = GetWalRcvLatestWalEnd();
+ SpinLockAcquire(&WalRcv->mutex);
+ if ((WalRcv->slotname[0] == '\0') ||
+ XLogRecPtrIsInvalid(latestWalEnd))
+ {
+ SpinLockRelease(&WalRcv->mutex);
+ return;
+ }
+ SpinLockRelease(&WalRcv->mutex);

The comment talks about the GUC "primary_slot_name", but the code is
checking the WalRcv's slotname. It may be the same, but the difference
is confusing.

~~~

6.
+ /*
+ * If restart_lsn, confirmed_lsn or catalog_xmin is invalid but slot
+ * is not invalidated, that means we have fetched the remote_slot in
+ * its RS_EPHEMERAL state itself. In such a case, avoid syncing it
+ * yet. We can always sync it in the next sync cycle when the
+ * remote_slot is persisted and has valid lsn(s) and xmin values.
+ *
+ * XXX: In future, if we plan to expose 'slot->data.persistency' in
+ * pg_replication_slots view, then we can avoid fetching RS_EPHEMERAL
+ * slots in the first place.
+ */

SUGGESTION (1st para)
If restart_lsn, confirmed_lsn or catalog_xmin is invalid but the slot
is valid, that means we have fetched the remote_slot in its
RS_EPHEMERAL state. In such a case, don't sync it; we can always sync
it in the next ...

~~~

7.
+ /*
+ * Use shared lock to prevent a conflict with
+ * ReplicationSlotsDropDBSlots(), trying to drop the same slot during
+ * a drop-database operation.
+ */
+ LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
+
+ synchronize_one_slot(remote_slot, remote_dbid);
+
+ UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);

IMO remove the blank lines (e.g., you don't use this kind of
formatting for spin locks)

======
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

09 February 2024, 04:30:04

On Thursday, February 8, 2024 7:07 PM shveta malik <shveta.malik@gmail.com>
> 
> On Thu, Feb 8, 2024 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Few comments on 0001
> > ===================
> > 7.
> > + /*
> > + * The primary_slot_name is not set yet or WALs not received yet.
> > + * Synchronization is not possible if the walreceiver is not started.
> > + */
> > + latestWalEnd = GetWalRcvLatestWalEnd();
> > + SpinLockAcquire(&WalRcv->mutex); if ((WalRcv->slotname[0] == '\0')
> > + ||
> > + XLogRecPtrIsInvalid(latestWalEnd))
> > + {
> > + SpinLockRelease(&WalRcv->mutex);
> > + return false;
> >
> > For the purpose of 0001, we should give WARNING here.

Fixed.

Here is the V82 patch set which includes the following changes:

0001
1. Fixed one miss that the size of shared memory for slot sync was not counted
   in CalculateShmemSize().
2. Added a warning message if walreceiver has not started yet.
2. Fixed the above comment.

0002 - 0003
Rebased

0004
1. Added more details that user should run second query on standby after the
   primary is down.
2. Mentioned that the query needs to be run on the db that includes the failover subscription.
Thanks Shveta for working on the changes.

Best Regards,
Hou zj

On Fri, Feb 9, 2024 at 10:00 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>

Few comments on 0001
===================
1. Shouldn't pg_sync_replication_slots() check whether the user has
replication privilege?

2. The function declarations in slotsync.h don't seem to be in the
same order as they are defined in slotsync.c. For example, see
ValidateSlotSyncParams(). The same is true for new functions exposed
via walreceiver.h and walsender.h. Please check the patch for other
such inconsistencies.

3.
+# Wait for the standby to finish sync
+$standby1->wait_for_log(
+ qr/LOG: ( [A-Z0-9]+:)? newly created slot \"lsub1_slot\" is sync-ready now/,
+ $offset);
+
+$standby1->wait_for_log(
+ qr/LOG: ( [A-Z0-9]+:)? newly created slot \"lsub2_slot\" is sync-ready now/,
+ $offset);
+
+# Confirm that the logical failover slots are created on the standby and are
+# flagged as 'synced'
+is($standby1->safe_psql('postgres',
+ q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN
('lsub1_slot', 'lsub2_slot') AND synced;}),
+ "t",
+ 'logical slots have synced as true on standby');

Isn't the last test that queried pg_replication_slots sufficient? I
think wait_for_log() would be required for slotsync worker or am I
missing something?

Apart from the above, I have modified a few comments in the attached.

--
With Regards,
Amit Kapila.

Attachment

v82_0001_amit.patch.txt

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

10 February 2024, 03:37:25

On Friday, February 9, 2024 6:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Fri, Feb 9, 2024 at 10:00 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> 
> Few comments on 0001
> ===================
> 1. Shouldn't pg_sync_replication_slots() check whether the user has replication
> privilege?

Yes, added.

> 
> 2. The function declarations in slotsync.h don't seem to be in the same order as
> they are defined in slotsync.c. For example, see ValidateSlotSyncParams(). The
> same is true for new functions exposed via walreceiver.h and walsender.h. Please
> check the patch for other such inconsistencies.

I reordered the function declarations.

> 
> 3.
> +# Wait for the standby to finish sync
> +$standby1->wait_for_log(
> + qr/LOG: ( [A-Z0-9]+:)? newly created slot \"lsub1_slot\" is sync-ready
> +now/,  $offset);
> +
> +$standby1->wait_for_log(
> + qr/LOG: ( [A-Z0-9]+:)? newly created slot \"lsub2_slot\" is sync-ready
> +now/,  $offset);
> +
> +# Confirm that the logical failover slots are created on the standby
> +and are # flagged as 'synced'
> +is($standby1->safe_psql('postgres',
> + q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN
> ('lsub1_slot', 'lsub2_slot') AND synced;}),
> + "t",
> + 'logical slots have synced as true on standby');
> 
> Isn't the last test that queried pg_replication_slots sufficient? I think
> wait_for_log() would be required for slotsync worker or am I missing something?

I think it's not needed in 0001, so removed.

> Apart from the above, I have modified a few comments in the attached.

Thanks, it looks good to me, so applied.

Attach the V83 patch which addressed Peter[1][2], Amit and Sawada-san's[3]
comments. Only 0001 is sent in this version, we will send other patches after
rebasing.

[1] https://www.postgresql.org/message-id/CAHut%2BPvW8s6AYD2UD0xadM%2B3VqBkXP2LjD30LEGRkHUa-Szm%2BQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPv88vp9mNxX37c_Bc5FDBsTS%2BdhV02Vgip9Wqwh7GBYSg%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAD21AoDvyLu%3D2-mqfGn_T_3jUamR34w%2BsxKvYnVzKqTCpyq_FQ%40mail.gmail.com

Best Regards,
Hou zj

Attachment

v83-0001-Add-a-slot-synchronization-function.patch

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

10 February 2024, 03:37:48

On Friday, February 9, 2024 12:27 PM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> Here are some review comments for patch v81-0001.

Thanks for the comments.

> . GENERAL - ReplicationSlotInvalidationCause enum.
> 
> I was thinking that the ReplicationSlotInvalidationCause should
> explicitly set RS_INVAL_NONE = 0 (it's zero anyway, but making it
> explicit with a comment / Must be zero. /. will stop it from being
> changed in the future).

I think the current code is better, so didn't change this.

> 5. synchronize_slots
> 
> + /*
> + * The primary_slot_name is not set yet or WALs not received yet.
> + * Synchronization is not possible if the walreceiver is not started.
> + */
> + latestWalEnd = GetWalRcvLatestWalEnd();
> + SpinLockAcquire(&WalRcv->mutex);
> + if ((WalRcv->slotname[0] == '\0') ||
> + XLogRecPtrIsInvalid(latestWalEnd))
> + {
> + SpinLockRelease(&WalRcv->mutex);
> + return;
> + }
> + SpinLockRelease(&WalRcv->mutex);
> 
> The comment talks about the GUC "primary_slot_name", but the code is
> checking the WalRcv's slotname. It may be the same, but the difference
> is confusing.

This part has been removed.

> 7.
> + /*
> + * Use shared lock to prevent a conflict with
> + * ReplicationSlotsDropDBSlots(), trying to drop the same slot during
> + * a drop-database operation.
> + */
> + LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock);
> +
> + synchronize_one_slot(remote_slot, remote_dbid);
> +
> + UnlockSharedObject(DatabaseRelationId, remote_dbid, 0,
> + AccessShareLock);
> 
> IMO remove the blank lines (e.g., you don't use this kind of formatting for spin
> locks)

I am not sure if it will look better, so didn't change this.

Other comments look good.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

10 February 2024, 03:37:57

On Friday, February 9, 2024 4:13 PM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> FYI -- I checked patch v81-0001 to find which of the #includes are strictly needed.

Thanks!

> 
> 1.
> ...
> 
> Many of these #includes seem unnecessary. e.g. I was able to remove
> all those that are commented-out below, and the file still compiles OK
> with no warnings:

Removed.

> 
> 
> ======
> src/backend/replication/slot.c
> 
> 
> 
> 2.
>  #include "pgstat.h"
> +#include "replication/slotsync.h"
>  #include "replication/slot.h"
> +#include "replication/walsender.h"
>  #include "storage/fd.h"
> 
> The #include "replication/walsender.h" seems to be unnecessary.

Removed.

> 
> ======
> src/backend/replication/walsender.c
> 
> 3.
>  #include "replication/logical.h"
> +#include "replication/slotsync.h"
>  #include "replication/slot.h"
> 
> The #include "replication/slotsync.h" is needed, but only for Assert code:
> Assert(am_cascading_walsender || IsSyncingReplicationSlots());
> 
> So you could #ifdef around that #include if you wish to.

I am not sure if it's necessary and didn't find similar coding, so
didn't change.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

10 February 2024, 09:18:47

On Saturday, February 10, 2024 11:37 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> Attach the V83 patch which addressed Peter[1][2], Amit and Sawada-san's[3]
> comments. Only 0001 is sent in this version, we will send other patches after
> rebasing.
> 
> [1]
> https://www.postgresql.org/message-id/CAHut%2BPvW8s6AYD2UD0xadM%2B
> 3VqBkXP2LjD30LEGRkHUa-Szm%2BQ%40mail.gmail.com
> [2]
> https://www.postgresql.org/message-id/CAHut%2BPv88vp9mNxX37c_Bc5FDBs
> TS%2BdhV02Vgip9Wqwh7GBYSg%40mail.gmail.com
> [3]
> https://www.postgresql.org/message-id/CAD21AoDvyLu%3D2-mqfGn_T_3jUa
> mR34w%2BsxKvYnVzKqTCpyq_FQ%40mail.gmail.com

I noticed one cfbot failure that the slot is not synced when the standby is
lagging behind the subscriber. I have modified the test to disable the sub
before syncing to avoid this failure. Attach the V83_2 patch, no other code
changes are included in this version.

Best Regards,
Hou zj

Attachment

v83_2-0001-Add-a-slot-synchronization-function.patch

Re: Synchronizing slots from primary to standby

From

Masahiko Sawada

Date:

10 February 2024, 12:00:45

On Fri, Feb 9, 2024 at 4:08 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Friday, February 9, 2024 2:44 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Feb 8, 2024 at 8:01 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Thu, Feb 8, 2024 at 12:08 PM Peter Smith <smithpb2250@gmail.com>
> > wrote:
> > > >
> > > > Here are some review comments for patch v80_2-0001.
> > >
> > > Thanks for the feedback Peter. Addressed the comments in v81.
> > > Attached patch001 for early feedback. Rest of the patches need
> > > rebasing and thus will post those later.
> > >
> > > It also addresses comments by Amit in [1].
> >
> > Thank you for updating the patch! Here are random comments:
>
> Thanks for the comments!
>
> >
> > ---
> > +
> > +        /*
> > +         * Register the callback function to clean up the shared memory of
> > slot
> > +         * synchronization.
> > +         */
> > +        SlotSyncInitialize();
> >
> > I think it would have a wider impact than expected. IIUC this callback is needed
> > only for processes who calls synchronize_slots(). Why do we want all processes
> > to register this callback?
>
> I think the current style is similar to the ReplicationSlotInitialize() above it. For backend,
> both of them can only be used when user calls slot SQL functions. So, I think it could be fine to
> register it at the general place which can also avoid registering the same again for the later
> slotsync worker patch.

Yes, but it seems to be a legitimate case since replication slot code
involves many functions that need the callback to clear the flag. On
the other hand, in the slotsync code, only one function,
SyncReplicationSlots(), needs the callback at least in 0001 patch.

> Another alternative is to register the callback when calling slotsync functions
> and unregister it after the function call. And register the callback in
> slotsyncworkmain() for the slotsync worker patch, although this may adds a few
> more codes.

Another idea is that SyncReplicationSlots() calls synchronize_slots()
in PG_ENSURE_ERROR_CLEANUP() block instead of PG_TRY(), to make sure
to clear the flag in case of ERROR or FATAL. And the slotsync worker
uses the before_shmem_callback to clear the flag.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

10 February 2024, 13:10:06

On Sat, Feb 10, 2024 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Feb 9, 2024 at 4:08 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
>
> > Another alternative is to register the callback when calling slotsync functions
> > and unregister it after the function call. And register the callback in
> > slotsyncworkmain() for the slotsync worker patch, although this may adds a few
> > more codes.
>
> Another idea is that SyncReplicationSlots() calls synchronize_slots()
> in PG_ENSURE_ERROR_CLEANUP() block instead of PG_TRY(), to make sure
> to clear the flag in case of ERROR or FATAL. And the slotsync worker
> uses the before_shmem_callback to clear the flag.
>

+1. This sounds like a better way to clear the flag.

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

11 February 2024, 13:23:19

On Saturday, February 10, 2024 9:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Sat, Feb 10, 2024 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
> >
> > On Fri, Feb 9, 2024 at 4:08 PM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> >
> > > Another alternative is to register the callback when calling
> > > slotsync functions and unregister it after the function call. And
> > > register the callback in
> > > slotsyncworkmain() for the slotsync worker patch, although this may
> > > adds a few more codes.
> >
> > Another idea is that SyncReplicationSlots() calls synchronize_slots()
> > in PG_ENSURE_ERROR_CLEANUP() block instead of PG_TRY(), to make sure
> > to clear the flag in case of ERROR or FATAL. And the slotsync worker
> > uses the before_shmem_callback to clear the flag.
> >
> 
> +1. This sounds like a better way to clear the flag.

Agreed. Here is the V84 patch which addressed this.

Apart from above, I removed the txn start/end codes from 0001 as they are used
in the slotsync worker patch. And I also ran pgindent and pgperltidy for the
patch.

Best Regards,
Hou zj

Attachment

v84-0001-Add-a-slot-synchronization-function.patch

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

12 February 2024, 09:40:12

On Sun, Feb 11, 2024 at 6:53 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Agreed. Here is the V84 patch which addressed this.
>

Few comments:
=============
1. Isn't the new function (pg_sync_replication_slots()) allowed to
sync the slots from physical standby to another cascading standby?
Won't it be better to simply disallow syncing slots on cascading
standby to keep it consistent with slotsync worker behavior?

2.
Previously, I commented to keep the declaration and definition of
functions in the same order but I see that it still doesn't match in
the below case:

@@ -44,6 +46,7 @@ extern void WalSndWakeup(bool physical, bool logical);
extern void WalSndInitStopping(void);
extern void WalSndWaitStopping(void);
extern void HandleWalSndInitStopping(void);
+extern XLogRecPtr GetStandbyFlushRecPtr(TimeLineID *tli);
extern void WalSndRqstFileReload(void);

I think we can keep the new declaration just before WalSndSignals().
That would be more consistent.

3.
+      <para>
+      True if this is a logical slot that was synced from a primary server.
+      </para>
+      <para>
+       On a hot standby, the slots with the synced column marked as true can
+       neither be used for logical decoding nor dropped by the user. The value

I don't think we need a separate para here.

Apart from this, I have made several cosmetic changes in the attached.
Please include these in the next version unless you see any problems.

--
With Regards,
Amit Kapila.

Attachment

v84_0001_amit_1.patch.txt

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

12 February 2024, 10:03:20

Hi,

On Sun, Feb 11, 2024 at 01:23:19PM +0000, Zhijie Hou (Fujitsu) wrote:
> On Saturday, February 10, 2024 9:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > 
> > On Sat, Feb 10, 2024 at 5:31 PM Masahiko Sawada <sawada.mshk@gmail.com>
> > wrote:
> > >
> > > On Fri, Feb 9, 2024 at 4:08 PM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > > Another alternative is to register the callback when calling
> > > > slotsync functions and unregister it after the function call. And
> > > > register the callback in
> > > > slotsyncworkmain() for the slotsync worker patch, although this may
> > > > adds a few more codes.
> > >
> > > Another idea is that SyncReplicationSlots() calls synchronize_slots()
> > > in PG_ENSURE_ERROR_CLEANUP() block instead of PG_TRY(), to make sure
> > > to clear the flag in case of ERROR or FATAL. And the slotsync worker
> > > uses the before_shmem_callback to clear the flag.
> > >
> > 
> > +1. This sounds like a better way to clear the flag.
> 
> Agreed. Here is the V84 patch which addressed this.
> 
> Apart from above, I removed the txn start/end codes from 0001 as they are used
> in the slotsync worker patch. And I also ran pgindent and pgperltidy for the
> patch.
> 

Thanks!

A few random comments:

001 ===

"
 For
 the synchronization to work, it is mandatory to have a physical
 replication slot between the primary and the standby,
"

Maybe mention "primary_slot_name" here?

002 ===

+       <para>
+        Synchronize the logical failover slots from the primary server to the standby server.

should we say "logical failover replication slots" instead?

003 ===

+          If, after executing the function,
+          <link linkend="guc-hot-standby-feedback">
+          <varname>hot_standby_feedback</varname></link> is disabled on
+          the standby or the physical slot configured in
+          <link linkend="guc-primary-slot-name">
+          <varname>primary_slot_name</varname></link> is
+          removed,

I think another option that could lead to slot invalidation is if primary_slot_name
is NULL or miss-configured. Indeed hot_standby_feedback would be working
(for the catalog_xmin) but only as long as the standby is up and running.

004 ===

+     on the standby. For the synchronization to work, it is mandatory to
+     have a physical replication slot between the primary and the standby,

should we mention primary_slot_name here?

005 ===

+     To resume logical replication after failover from the synced logical
+     slots, the subscription's 'conninfo' must be altered

Only in a pub/sub context but not for other ways of using the logical replication
slot(s).

006 ===

+       neither be used for logical decoding nor dropped by the user

what about "nor dropped manually"?

007 ===

+typedef struct SlotSyncCtxStruct
+{

Should we remove "Struct" from the struct name?

008 ===

+                       ereport(LOG,
+                                       errmsg("dropped replication slot \"%s\" of dbid %d",
+                                                  NameStr(local_slot->data.name),
+                                                  local_slot->data.database));

We emit a message when an "invalidated" slot is dropped but not when we create
a slot. Shouldn't we emit a message when we create a synced slot on the standby?

I think that could be confusing to see "a drop" message not followed by "a create"
one when it's expected (slot valid on the primary for example).

009 ===

Regarding 040_standby_failover_slots_sync.pl what about adding tests for?

- synced slot invalidation (and ensure it's recreated once pg_sync_replication_slots()
is called and when the slot in primary is valid)
- cannot enable failover for a temporary replication slot
- replication slots can only be synchronized from a standby server

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

12 February 2024, 10:49:33

On Mon, Feb 12, 2024 at 3:33 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> A few random comments:
>
>
> 003 ===
>
> +          If, after executing the function,
> +          <link linkend="guc-hot-standby-feedback">
> +          <varname>hot_standby_feedback</varname></link> is disabled on
> +          the standby or the physical slot configured in
> +          <link linkend="guc-primary-slot-name">
> +          <varname>primary_slot_name</varname></link> is
> +          removed,
>
> I think another option that could lead to slot invalidation is if primary_slot_name
> is NULL or miss-configured.
>

If the primary_slot_name is NULL then the function will error out. So,
not sure, if we need to say anything explicitly here.

> Indeed hot_standby_feedback would be working
> (for the catalog_xmin) but only as long as the standby is up and running.
>
...
>
> 005 ===
>
> +     To resume logical replication after failover from the synced logical
> +     slots, the subscription's 'conninfo' must be altered
>
> Only in a pub/sub context but not for other ways of using the logical replication
> slot(s).
>

Right, but what additional information do you want here? I thought we
were speaking about the in-build logical replication here so this is
okay.

>
> 008 ===
>
> +                       ereport(LOG,
> +                                       errmsg("dropped replication slot \"%s\" of dbid %d",
> +                                                  NameStr(local_slot->data.name),
> +                                                  local_slot->data.database));
>
> We emit a message when an "invalidated" slot is dropped but not when we create
> a slot. Shouldn't we emit a message when we create a synced slot on the standby?
>
> I think that could be confusing to see "a drop" message not followed by "a create"
> one when it's expected (slot valid on the primary for example).
>

Isn't the below message for sync-ready slot sufficient? Otherwise, in
most cases, we will LOG multiple similar messages.

+ ereport(LOG,
+ errmsg("newly created slot \"%s\" is sync-ready now",
+    remote_slot->name));

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

12 February 2024, 14:06:03

Hi,

On Mon, Feb 12, 2024 at 04:19:33PM +0530, Amit Kapila wrote:
> On Mon, Feb 12, 2024 at 3:33 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > A few random comments:
> >
> >
> > 003 ===
> >
> > +          If, after executing the function,
> > +          <link linkend="guc-hot-standby-feedback">
> > +          <varname>hot_standby_feedback</varname></link> is disabled on
> > +          the standby or the physical slot configured in
> > +          <link linkend="guc-primary-slot-name">
> > +          <varname>primary_slot_name</varname></link> is
> > +          removed,
> >
> > I think another option that could lead to slot invalidation is if primary_slot_name
> > is NULL or miss-configured.
> >
> 
> If the primary_slot_name is NULL then the function will error out.

Yeah right, it had to be non NULL initially so we know there is a physical slot (if 
not dropped) that should prevent conflicts at the first place (should hsf be on).
Please forget about comment 003 then.

> >
> > 005 ===
> >
> > +     To resume logical replication after failover from the synced logical
> > +     slots, the subscription's 'conninfo' must be altered
> >
> > Only in a pub/sub context but not for other ways of using the logical replication
> > slot(s).
> >
> 
> Right, but what additional information do you want here? I thought we
> were speaking about the in-build logical replication here so this is
> okay.

The "Logical Decoding Concepts" sub-chapter also mentions "Logical decoding clients"
so I was not sure the part added in the patch was for in-build logical replication
only.

Or maybe just reword that way "In case of in-build logical replication, to resume
after failover from the synced......"?

> 
> >
> > 008 ===
> >
> > +                       ereport(LOG,
> > +                                       errmsg("dropped replication slot \"%s\" of dbid %d",
> > +                                                  NameStr(local_slot->data.name),
> > +                                                  local_slot->data.database));
> >
> > We emit a message when an "invalidated" slot is dropped but not when we create
> > a slot. Shouldn't we emit a message when we create a synced slot on the standby?
> >
> > I think that could be confusing to see "a drop" message not followed by "a create"
> > one when it's expected (slot valid on the primary for example).
> >
> 
> Isn't the below message for sync-ready slot sufficient? Otherwise, in
> most cases, we will LOG multiple similar messages.
> 
> + ereport(LOG,
> + errmsg("newly created slot \"%s\" is sync-ready now",
> +    remote_slot->name));

Yes it is sufficient if we reach it. For example during some test, I was able to
go through this code path:

Breakpoint 2, update_and_persist_local_synced_slot (remote_slot=0x56450e7c49c0, remote_dbid=5) at slotsync.c:340
340             ReplicationSlot *slot = MyReplicationSlot;
(gdb) n
346             if (remote_slot->restart_lsn < slot->data.restart_lsn ||
(gdb)
347                     TransactionIdPrecedes(remote_slot->catalog_xmin,
(gdb)
346             if (remote_slot->restart_lsn < slot->data.restart_lsn ||
(gdb)
358                     return;

means exiting from update_and_persist_local_synced_slot() without reaching the
"newly created slot" message (the slot on the primary was "inactive").

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

13 February 2024, 01:15:24

On Monday, February 12, 2024 6:03 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> 
> Hi,
> 
> On Sun, Feb 11, 2024 at 01:23:19PM +0000, Zhijie Hou (Fujitsu) wrote:
> > On Saturday, February 10, 2024 9:10 PM Amit Kapila
> <amit.kapila16@gmail.com> wrote:
> > >
> > > On Sat, Feb 10, 2024 at 5:31 PM Masahiko Sawada
> > > <sawada.mshk@gmail.com>
> > > wrote:
> > > >
> > > > On Fri, Feb 9, 2024 at 4:08 PM Zhijie Hou (Fujitsu)
> > > > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > > Another alternative is to register the callback when calling
> > > > > slotsync functions and unregister it after the function call.
> > > > > And register the callback in
> > > > > slotsyncworkmain() for the slotsync worker patch, although this
> > > > > may adds a few more codes.
> > > >
> > > > Another idea is that SyncReplicationSlots() calls
> > > > synchronize_slots() in PG_ENSURE_ERROR_CLEANUP() block instead of
> > > > PG_TRY(), to make sure to clear the flag in case of ERROR or
> > > > FATAL. And the slotsync worker uses the before_shmem_callback to clear
> the flag.
> > > >
> > >
> > > +1. This sounds like a better way to clear the flag.
> >
> > Agreed. Here is the V84 patch which addressed this.
> >
> > Apart from above, I removed the txn start/end codes from 0001 as they
> > are used in the slotsync worker patch. And I also ran pgindent and
> > pgperltidy for the patch.
> >
> 
> Thanks!
> 
> A few random comments:

Thanks for the comments.

> 
> 001 ===
> 
> "
>  For
>  the synchronization to work, it is mandatory to have a physical  replication slot
> between the primary and the standby, "
> 
> Maybe mention "primary_slot_name" here?

Added.

> 
> 002 ===
> 
> +       <para>
> +        Synchronize the logical failover slots from the primary server to the
> standby server.
> 
> should we say "logical failover replication slots" instead?

Changed.

> 
> 003 ===
> 
> +          If, after executing the function,
> +          <link linkend="guc-hot-standby-feedback">
> +          <varname>hot_standby_feedback</varname></link> is disabled
> on
> +          the standby or the physical slot configured in
> +          <link linkend="guc-primary-slot-name">
> +          <varname>primary_slot_name</varname></link> is
> +          removed,
> 
> I think another option that could lead to slot invalidation is if primary_slot_name
> is NULL or miss-configured. Indeed hot_standby_feedback would be working
> (for the catalog_xmin) but only as long as the standby is up and running.

I didn't change this based on the discussion.

> 
> 004 ===
> 
> +     on the standby. For the synchronization to work, it is mandatory to
> +     have a physical replication slot between the primary and the
> + standby,
> 
> should we mention primary_slot_name here?

Added.

> 
> 005 ===
> 
> +     To resume logical replication after failover from the synced logical
> +     slots, the subscription's 'conninfo' must be altered
> 
> Only in a pub/sub context but not for other ways of using the logical replication
> slot(s).

I am not very sure about this, because the 3-rd part logicalrep can also
have their own replication origin, so I didn't change for now, but will think over
this.

> 
> 006 ===
> 
> +       neither be used for logical decoding nor dropped by the user
> 
> what about "nor dropped manually"?

Changed.

> 
> 007 ===
> 
> +typedef struct SlotSyncCtxStruct
> +{
> 
> Should we remove "Struct" from the struct name?

The name was named based on some other comment to be consistent
with LogicalReplCtxStruct, so I didn't change this.
If other also prefer without struct, we can change it later.

> 008 ===
> 
> +                       ereport(LOG,
> +                                       errmsg("dropped replication slot
> \"%s\" of dbid %d",
> +
> NameStr(local_slot->data.name),
> +
> + local_slot->data.database));
> 
> We emit a message when an "invalidated" slot is dropped but not when we
> create a slot. Shouldn't we emit a message when we create a synced slot on the
> standby?
> 
> I think that could be confusing to see "a drop" message not followed by "a
> create"
> one when it's expected (slot valid on the primary for example).

I think we will report "sync-ready" for newly synced slot, for newly
created temporary slots, I am not sure do we need to report log to them,
because they will be dropped on promotion anyway. But if others also prefer to log,
I am fine with that.

> 
> 009 ===
> 
> Regarding 040_standby_failover_slots_sync.pl what about adding tests for?
> 
> - synced slot invalidation (and ensure it's recreated once
> pg_sync_replication_slots() is called and when the slot in primary is valid)

Will try this in next version.

> - cannot enable failover for a temporary replication slot

Added.

> - replication slots can only be synchronized from a standby server

Added.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

13 February 2024, 01:15:48

On Monday, February 12, 2024 5:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Sun, Feb 11, 2024 at 6:53 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > Agreed. Here is the V84 patch which addressed this.
> >
> 
> Few comments:
> =============
> 1. Isn't the new function (pg_sync_replication_slots()) allowed to sync the slots
> from physical standby to another cascading standby?
> Won't it be better to simply disallow syncing slots on cascading standby to keep
> it consistent with slotsync worker behavior?
> 
> 2.
> Previously, I commented to keep the declaration and definition of functions in
> the same order but I see that it still doesn't match in the below case:
> 
> @@ -44,6 +46,7 @@ extern void WalSndWakeup(bool physical, bool logical);
> extern void WalSndInitStopping(void); extern void WalSndWaitStopping(void);
> extern void HandleWalSndInitStopping(void);
> +extern XLogRecPtr GetStandbyFlushRecPtr(TimeLineID *tli);
> extern void WalSndRqstFileReload(void);
> 
> I think we can keep the new declaration just before WalSndSignals().
> That would be more consistent.
> 
> 3.
> +      <para>
> +      True if this is a logical slot that was synced from a primary server.
> +      </para>
> +      <para>
> +       On a hot standby, the slots with the synced column marked as true can
> +       neither be used for logical decoding nor dropped by the user.
> + The value
> 
> I don't think we need a separate para here.
> 
> Apart from this, I have made several cosmetic changes in the attached.
> Please include these in the next version unless you see any problems.

Thanks for the comments, I have addressed them.

Here is the new version patch which addressed above and
most of Bertrand's comments.

TODO: trying to add one test for the case the slot is valid on
primary while the synced slots is invalidated on the standby.

Best Regards,
Houzj

Attachment

v85-0001-Add-a-slot-synchronization-function.patch

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

13 February 2024, 03:38:39

On Tue, Feb 13, 2024 at 6:45 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Monday, February 12, 2024 5:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Thanks for the comments, I have addressed them.
>
> Here is the new version patch which addressed above and
> most of Bertrand's comments.

Thanks for the patch.

I am trying to run valgrind on patch001. I followed the steps given in
[1]. It ended up generating 850 log files. Is there a way to figure
out that we have a memory related problem w/o going through each log
file manually?

I also tried running the steps with '-leak-check=summary' (in the
first run, it was '-leak-check=no' as suggested in wiki) with and
without the patch and tried comparing those manually for a few of
them. I found o/p more or less the same. But this is a mammoth task if
we have to do it manually for 850 files. So any pointers here?

For reference:

Sample log file with  '-leak-check=no'
==00:00:08:44.321 250746== HEAP SUMMARY:
==00:00:08:44.321 250746==     in use at exit: 1,298,274 bytes in 290 blocks
==00:00:08:44.321 250746==   total heap usage: 11,958 allocs, 7,005
frees, 8,175,630 bytes allocated
==00:00:08:44.321 250746==
==00:00:08:44.321 250746== For a detailed leak analysis, rerun with:
--leak-check=full
==00:00:08:44.321 250746==
==00:00:08:44.321 250746== For lists of detected and suppressed
errors, rerun with: -s
==00:00:08:44.321 250746== ERROR SUMMARY: 0 errors from 0 contexts
(suppressed: 0 from 0)


Sample log file with  '-leak-check=summary'
==00:00:00:27.300 265785== HEAP SUMMARY:
==00:00:00:27.300 265785==     in use at exit: 1,929,907 bytes in 310 blocks
==00:00:00:27.300 265785==   total heap usage: 71,677 allocs, 7,754
frees, 95,750,897 bytes allocated
==00:00:00:27.300 265785==
==00:00:00:27.394 265785== LEAK SUMMARY:
==00:00:00:27.394 265785==    definitely lost: 20,507 bytes in 171 blocks
==00:00:00:27.394 265785==    indirectly lost: 16,419 bytes in 61 blocks
==00:00:00:27.394 265785==      possibly lost: 354,670 bytes in 905 blocks
==00:00:00:27.394 265785==    still reachable: 592,586 bytes in 1,473 blocks
==00:00:00:27.394 265785==         suppressed: 0 bytes in 0 blocks
==00:00:00:27.394 265785== Rerun with --leak-check=full to see details
of leaked memory
==00:00:00:27.394 265785==
==00:00:00:27.394 265785== For lists of detected and suppressed
errors, rerun with: -s
==00:00:00:27.394 265785== ERROR SUMMARY: 0 errors from 0 contexts
(suppressed: 0 from 0)


[1]: https://wiki.postgresql.org/wiki/Valgrind

thanks
Shveta

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

13 February 2024, 04:08:23

On Tuesday, February 13, 2024 9:16 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> Here is the new version patch which addressed above and most of Bertrand's
> comments.
> 
> TODO: trying to add one test for the case the slot is valid on primary while the
> synced slots is invalidated on the standby.

Here is the V85_2 patch set that added the test and fixed one typo,
there are no other code changes.

Best Regards,
Hou zj

Attachment

v85_2-0001-Add-a-slot-synchronization-function.patch

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

13 February 2024, 06:50:30

On Tue, Feb 13, 2024 at 9:38 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Here is the V85_2 patch set that added the test and fixed one typo,
> there are no other code changes.
>

Few comments on the latest changes:
==============================
1.
+# Confirm that the invalidated slot has been dropped.
+$standby1->wait_for_log(qr/dropped replication slot "lsub1_slot" of dbid 5/,
+ $log_offset);

Is it okay to hardcode dbid 5? I am a bit worried that it can lead to
instability in the test.

2.
+check_primary_info(WalReceiverConn *wrconn, int elevel)
+{
..
+ bool primary_info_valid;

I don't think for 0001, we need an elevel as an argument, so let's
remove it. Additionally, can we change the variable name
primary_info_valid to primary_slot_valid? Also, can we change the
function name to validate_remote_info() as the remote can be both
primary or standby?

3.
+SyncReplicationSlots(WalReceiverConn *wrconn)
+{
+ PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback, PointerGetDatum(wrconn));
+ {
+ check_primary_info(wrconn, ERROR);
+
+ synchronize_slots(wrconn);
+ }
+ PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback,
PointerGetDatum(wrconn));
+
+ walrcv_disconnect(wrconn);

It is better to disconnect in the caller where we have made the connection.

--
With Regards,
Amit Kapila.

Attachment

v85_amit_1.patch.txt

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

13 February 2024, 09:59:45

On Fri, Feb 9, 2024 at 10:04 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> +reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
> {
> ...
> + /*
> + * Find the oldest existing WAL segment file.
> + *
> + * Normally, we can determine it by using the last removed segment
> + * number. However, if no WAL segment files have been removed by a
> + * checkpoint since startup, we need to search for the oldest segment
> + * file currently existing in XLOGDIR.
> + */
> + oldest_segno = XLogGetLastRemovedSegno() + 1;
> +
> + if (oldest_segno == 1)
> + {
> + TimeLineID cur_timeline;
> +
> + GetWalRcvFlushRecPtr(NULL, &cur_timeline);
> + oldest_segno = XLogGetOldestSegno(cur_timeline);
> ...
> ...
>
> This means that if the restart_lsn of the slot is from the prior
> timeline then the standby needs to wait for longer times to sync the
> slot. Ideally, it should be okay because I don't think even if
> restart_lsn of the slot may be from some prior timeline than the
> current flush timeline on standby, how often that case can happen?

I tested this behaviour on v85 patch, it is working as expected i.e.
if remot_slot's lsn belongs to a prior timeline then on executing
pg_sync_replication_slots() function, it creates a temporary slot and
waits for primary to catch up. And once primary catches up, the next
execution of SQL function persistes the slot and syncs it.

Setup: primary-->standby1-->standby2

Steps:
1) Insert data on primary. It gets replicated to both standbys.
2) Create logical slot on primary and execute
pg_sync_replication_slots() on standby1. The slot gets synced and
persisted on standby1.
3) Shutdown standby2.
4) Insert data on primary. It gets replicated to standby1.
5) Shutdown primary and promote standby1.
6) Insert some data on standby1/new primary directly.
7) Start standby2: It now needs to catch up old data of timeline1
(from step 4) + new data of timeline2 (from step 6) . It does that. On
reaching the end of the old timeline, walreceiver gets restarted and
starts streaming using the new timeline.
8) Execute pg_sync_replication_slots() on standby2 to sync the slot.
Now remote_slot's lsn belongs to a prior timeline on standby2. In my
test-run, remote_slot's lsn belonged to segno=4 on standby2, while the
oldest segno of current_timline(2) was 6. Thus it created the slot
locally with lsn belonging to the oldest segno 6 of cur_timeline(2)
but did not persist it as remote_slot's lsn was behind.
9) Now on standby1/new-primary, advance the logical slot by calling
pg_replication_slot_advance().
10) Execute pg_sync_replication_slots() again on standby2, now the
local temporary slot gets persisted as the restart_lsn of primary has
caught up.

thanks
Shveta

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

13 February 2024, 11:29:51

Hi,

On Tue, Feb 13, 2024 at 04:08:23AM +0000, Zhijie Hou (Fujitsu) wrote:
> On Tuesday, February 13, 2024 9:16 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> > 
> > Here is the new version patch which addressed above and most of Bertrand's
> > comments.
> > 
> > TODO: trying to add one test for the case the slot is valid on primary while the
> > synced slots is invalidated on the standby.
> 
> Here is the V85_2 patch set that added the test and fixed one typo,
> there are no other code changes.

Thanks!

Out of curiosity I ran a code coverage and the result for slotsync.c can be
found in [1].

It appears that:

- only one function is not covered (slotsync_failure_callback()).
- 84% of the slotsync.c code is covered, the parts that are not are mainly
related to "errors".

Worth to try to extend the coverage? (I've in mind 731, 739, 766, 778, 786, 796,
808)

[1]:
https://htmlpreview.github.io/?https://raw.githubusercontent.com/bdrouvot/pg_code_coverage/main/src/backend/replication/logical/slotsync.c.gcov.html

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

13 February 2024, 11:50:35

On Tue, Feb 13, 2024 at 4:59 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Tue, Feb 13, 2024 at 04:08:23AM +0000, Zhijie Hou (Fujitsu) wrote:
> > On Tuesday, February 13, 2024 9:16 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> > >
> > > Here is the new version patch which addressed above and most of Bertrand's
> > > comments.
> > >
> > > TODO: trying to add one test for the case the slot is valid on primary while the
> > > synced slots is invalidated on the standby.
> >
> > Here is the V85_2 patch set that added the test and fixed one typo,
> > there are no other code changes.
>
> Thanks!
>
> Out of curiosity I ran a code coverage and the result for slotsync.c can be
> found in [1].
>
> It appears that:
>
> - only one function is not covered (slotsync_failure_callback()).
> - 84% of the slotsync.c code is covered, the parts that are not are mainly
> related to "errors".
>
> Worth to try to extend the coverage? (I've in mind 731, 739, 766, 778, 786, 796,
> 808)
>

All these additional line numbers mentioned by you are ERROR paths. I
think if we want we can easily cover most of those but I am not sure
if there is a benefit to cover each error path.

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

13 February 2024, 12:35:27

On Tuesday, February 13, 2024 2:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Tue, Feb 13, 2024 at 9:38 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > Here is the V85_2 patch set that added the test and fixed one typo,
> > there are no other code changes.
> >
> 
> Few comments on the latest changes:

Thanks for the comments.

> ==============================
> 1.
> +# Confirm that the invalidated slot has been dropped.
> +$standby1->wait_for_log(qr/dropped replication slot "lsub1_slot" of
> +dbid 5/,  $log_offset);
> 
> Is it okay to hardcode dbid 5? I am a bit worried that it can lead to instability in
> the test.
> 
> 2.
> +check_primary_info(WalReceiverConn *wrconn, int elevel) {
> ..
> + bool primary_info_valid;
> 
> I don't think for 0001, we need an elevel as an argument, so let's remove it.
> Additionally, can we change the variable name primary_info_valid to
> primary_slot_valid? Also, can we change the function name to
> validate_remote_info() as the remote can be both primary or standby?
> 
> 3.
> +SyncReplicationSlots(WalReceiverConn *wrconn) {
> +PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback,
> +PointerGetDatum(wrconn));  {  check_primary_info(wrconn, ERROR);
> +
> + synchronize_slots(wrconn);
> + }
> + PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback,
> PointerGetDatum(wrconn));
> +
> + walrcv_disconnect(wrconn);
> 
> It is better to disconnect in the caller where we have made the connection.

All above comments look good to me.
Here is the V86 patch that addressed above. This version also includes some
other minor changes:

1. Added few comments for the temporary slot creation and XLogGetOldestSegno.
2. Adjusted the doc for the SQL function.
3. Reordered two error messages in slot create function.
4. Fixed few typos.

Thanks Shveta for off-list discussions.

Best Regards,
Hou zj

Attachment

v86-0001-Add-a-slot-synchronization-function.patch

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

13 February 2024, 12:39:32

On Tuesday, February 13, 2024 7:30 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
>
> On Tue, Feb 13, 2024 at 04:08:23AM +0000, Zhijie Hou (Fujitsu) wrote:
> > On Tuesday, February 13, 2024 9:16 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> > >
> > > Here is the new version patch which addressed above and most of
> > > Bertrand's comments.
> > >
> > > TODO: trying to add one test for the case the slot is valid on
> > > primary while the synced slots is invalidated on the standby.
> >
> > Here is the V85_2 patch set that added the test and fixed one typo,
> > there are no other code changes.
>
> Thanks!
>
> Out of curiosity I ran a code coverage and the result for slotsync.c can be found
> in [1].
>
> It appears that:
>
> - only one function is not covered (slotsync_failure_callback()).

Thanks for the test ! I think slotsync_failure_callback can be covered easier in the
next slotsync worker patch on worker exit, I will post that after rebasing.

Best Regards,
Hou zj

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

13 February 2024, 15:55:07

Hi,

On Tue, Feb 13, 2024 at 05:20:35PM +0530, Amit Kapila wrote:
> On Tue, Feb 13, 2024 at 4:59 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > - 84% of the slotsync.c code is covered, the parts that are not are mainly
> > related to "errors".
> >
> > Worth to try to extend the coverage? (I've in mind 731, 739, 766, 778, 786, 796,
> > 808)
> >
> 
> All these additional line numbers mentioned by you are ERROR paths. I
> think if we want we can easily cover most of those but I am not sure
> if there is a benefit to cover each error path.

Yeah, I think 731, 739 and one among the remaining ones mentioned up-thread should
be enough, thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

14 February 2024, 02:39:33

On Tue, Feb 13, 2024 at 9:25 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Tue, Feb 13, 2024 at 05:20:35PM +0530, Amit Kapila wrote:
> > On Tue, Feb 13, 2024 at 4:59 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > > - 84% of the slotsync.c code is covered, the parts that are not are mainly
> > > related to "errors".
> > >
> > > Worth to try to extend the coverage? (I've in mind 731, 739, 766, 778, 786, 796,
> > > 808)
> > >
> >
> > All these additional line numbers mentioned by you are ERROR paths. I
> > think if we want we can easily cover most of those but I am not sure
> > if there is a benefit to cover each error path.
>
> Yeah, I think 731, 739 and one among the remaining ones mentioned up-thread should
> be enough, thoughts?
>

I don't know how beneficial those selective ones would be but if I
have to pick a few among those then I would pick the ones at 731 and
808. The reason is that 731 is related to cascading standby
restriction which we may uplift in the future and at that time one
needs to be careful about the behavior, for 808 as well, in the
future, we may have a separate GUC for slot_db_name. These may not be
good enough reasons as to why we add tests for these ERROR cases but
not for others, however, if we have to randomly pick a few among all
ERROR paths, these seem better to me than others.

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

14 February 2024, 04:03:58

On Wednesday, February 14, 2024 10:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Tue, Feb 13, 2024 at 9:25 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Tue, Feb 13, 2024 at 05:20:35PM +0530, Amit Kapila wrote:
> > > On Tue, Feb 13, 2024 at 4:59 PM Bertrand Drouvot
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > > > - 84% of the slotsync.c code is covered, the parts that are not
> > > > are mainly related to "errors".
> > > >
> > > > Worth to try to extend the coverage? (I've in mind 731, 739, 766,
> > > > 778, 786, 796,
> > > > 808)
> > > >
> > >
> > > All these additional line numbers mentioned by you are ERROR paths.
> > > I think if we want we can easily cover most of those but I am not
> > > sure if there is a benefit to cover each error path.
> >
> > Yeah, I think 731, 739 and one among the remaining ones mentioned
> > up-thread should be enough, thoughts?
> >
> 
> I don't know how beneficial those selective ones would be but if I have to pick a
> few among those then I would pick the ones at 731 and 808. The reason is that
> 731 is related to cascading standby restriction which we may uplift in the future
> and at that time one needs to be careful about the behavior, for 808 as well, in
> the future, we may have a separate GUC for slot_db_name. These may not be
> good enough reasons as to why we add tests for these ERROR cases but not for
> others, however, if we have to randomly pick a few among all ERROR paths,
> these seem better to me than others.

Here is V87 patch that adds test for the suggested cases.

Best Regards,
Hou zj

Attachment

v87-0001-Add-a-slot-synchronization-function.patch

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

14 February 2024, 08:44:11

On Wed, Feb 14, 2024 at 9:34 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Here is V87 patch that adds test for the suggested cases.
>

I have pushed this patch and it leads to a BF failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2024-02-14%2004%3A43%3A37

The test failures are:
#   Failed test 'logical decoding is not allowed on synced slot'
#   at /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl
line 272.
#   Failed test 'synced slot on standby cannot be altered'
#   at /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl
line 281.
#   Failed test 'synced slot on standby cannot be dropped'
#   at /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl
line 287.

The reason is that in LOGs, we see a different ERROR message than what
is expected:
2024-02-14 04:52:32.916 UTC [1767765][client backend][3/4:0] ERROR:
replication slot "lsub1_slot" is active for PID 1760871

Now, we see the slot still active because a test before these tests (#
Test that if the synchronized slot is invalidated while the remote
slot is still valid, ....) is not able to successfully persist the
slot and the synced temporary slot remains active.

The reason is clear by referring to below standby LOGS:

LOG:  connection authorized: user=bf database=postgres
application_name=040_standby_failover_slots_sync.pl
LOG:  statement: SELECT pg_sync_replication_slots();
LOG:  dropped replication slot "lsub1_slot" of dbid 5
STATEMENT:  SELECT pg_sync_replication_slots();
...
SELECT conflict_reason IS NULL AND synced FROM pg_replication_slots
WHERE slot_name = 'lsub1_slot';

In the above LOGs, we should ideally see: "newly created slot
"lsub1_slot" is sync-ready now" after the "LOG:  dropped replication
slot "lsub1_slot" of dbid 5" but lack of that means the test didn't
accomplish what it was supposed to. Ideally, the same test should have
failed but the pass criteria for the test failed to check whether the
slot is persisted or not.

The probable reason for failure is that remote_slot's restart_lsn lags
behind the oldest WAL segment on standby. Now, in the test, we do
ensure that the publisher and subscriber are caught up by following
steps:
# Enable the subscription to let it catch up to the latest wal position
$subscriber1->safe_psql('postgres',
"ALTER SUBSCRIPTION regress_mysub1 ENABLE");

$primary->wait_for_catchup('regress_mysub1');

However, this doesn't guarantee that restart_lsn is moved to a
position new enough that standby has a WAL corresponding to it. One
easy fix is to re-create the subscription with the same slot_name
after we have ensured that the slot has been invalidated on standby so
that a new restart_lsn is assigned to the slot but it is better to
analyze some more why the slot's restart_lsn hasn't moved enough only
sometimes.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

14 February 2024, 10:05:03

On Wed, Feb 14, 2024 at 2:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Feb 14, 2024 at 9:34 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > Here is V87 patch that adds test for the suggested cases.
> >
>
> I have pushed this patch and it leads to a BF failure:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2024-02-14%2004%3A43%3A37
>
> The test failures are:
> #   Failed test 'logical decoding is not allowed on synced slot'
> #   at /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl
> line 272.
> #   Failed test 'synced slot on standby cannot be altered'
> #   at /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl
> line 281.
> #   Failed test 'synced slot on standby cannot be dropped'
> #   at /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl
> line 287.
>
> The reason is that in LOGs, we see a different ERROR message than what
> is expected:
> 2024-02-14 04:52:32.916 UTC [1767765][client backend][3/4:0] ERROR:
> replication slot "lsub1_slot" is active for PID 1760871
>
> Now, we see the slot still active because a test before these tests (#
> Test that if the synchronized slot is invalidated while the remote
> slot is still valid, ....) is not able to successfully persist the
> slot and the synced temporary slot remains active.
>
> The reason is clear by referring to below standby LOGS:
>
> LOG:  connection authorized: user=bf database=postgres
> application_name=040_standby_failover_slots_sync.pl
> LOG:  statement: SELECT pg_sync_replication_slots();
> LOG:  dropped replication slot "lsub1_slot" of dbid 5
> STATEMENT:  SELECT pg_sync_replication_slots();
> ...
> SELECT conflict_reason IS NULL AND synced FROM pg_replication_slots
> WHERE slot_name = 'lsub1_slot';
>
> In the above LOGs, we should ideally see: "newly created slot
> "lsub1_slot" is sync-ready now" after the "LOG:  dropped replication
> slot "lsub1_slot" of dbid 5" but lack of that means the test didn't
> accomplish what it was supposed to. Ideally, the same test should have
> failed but the pass criteria for the test failed to check whether the
> slot is persisted or not.
>
> The probable reason for failure is that remote_slot's restart_lsn lags
> behind the oldest WAL segment on standby. Now, in the test, we do
> ensure that the publisher and subscriber are caught up by following
> steps:
> # Enable the subscription to let it catch up to the latest wal position
> $subscriber1->safe_psql('postgres',
> "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
>
> $primary->wait_for_catchup('regress_mysub1');
>
> However, this doesn't guarantee that restart_lsn is moved to a
> position new enough that standby has a WAL corresponding to it.
>

To ensure that restart_lsn has been moved to a recent position, we
need to log XLOG_RUNNING_XACTS and make sure the same is processed as
well by walsender. The attached patch does the required change.

Hou-San can reproduce this problem by adding additional checkpoints in
the test and after applying the attached it fixes the problem. Now,
this patch is mostly based on the theory we formed based on LOGs on BF
and a reproducer by Hou-San, so still, there is some chance that this
doesn't fix the BF failures in which case I'll again look into those.

--
With Regards,
Amit Kapila.

Attachment

fix_040_standby_failover_slots_sync.1.patch

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

14 February 2024, 10:40:11

On Wednesday, February 14, 2024 6:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Wed, Feb 14, 2024 at 2:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Feb 14, 2024 at 9:34 AM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > Here is V87 patch that adds test for the suggested cases.
> > >
> >
> > I have pushed this patch and it leads to a BF failure:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&d
> > t=2024-02-14%2004%3A43%3A37
> >
> > The test failures are:
> > #   Failed test 'logical decoding is not allowed on synced slot'
> > #   at
> /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_f
> ailover_slots_sync.pl
> > line 272.
> > #   Failed test 'synced slot on standby cannot be altered'
> > #   at
> /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_f
> ailover_slots_sync.pl
> > line 281.
> > #   Failed test 'synced slot on standby cannot be dropped'
> > #   at
> /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_f
> ailover_slots_sync.pl
> > line 287.
> >
> > The reason is that in LOGs, we see a different ERROR message than what
> > is expected:
> > 2024-02-14 04:52:32.916 UTC [1767765][client backend][3/4:0] ERROR:
> > replication slot "lsub1_slot" is active for PID 1760871
> >
> > Now, we see the slot still active because a test before these tests (#
> > Test that if the synchronized slot is invalidated while the remote
> > slot is still valid, ....) is not able to successfully persist the
> > slot and the synced temporary slot remains active.
> >
> > The reason is clear by referring to below standby LOGS:
> >
> > LOG:  connection authorized: user=bf database=postgres
> > application_name=040_standby_failover_slots_sync.pl
> > LOG:  statement: SELECT pg_sync_replication_slots();
> > LOG:  dropped replication slot "lsub1_slot" of dbid 5
> > STATEMENT:  SELECT pg_sync_replication_slots(); ...
> > SELECT conflict_reason IS NULL AND synced FROM pg_replication_slots
> > WHERE slot_name = 'lsub1_slot';
> >
> > In the above LOGs, we should ideally see: "newly created slot
> > "lsub1_slot" is sync-ready now" after the "LOG:  dropped replication
> > slot "lsub1_slot" of dbid 5" but lack of that means the test didn't
> > accomplish what it was supposed to. Ideally, the same test should have
> > failed but the pass criteria for the test failed to check whether the
> > slot is persisted or not.
> >
> > The probable reason for failure is that remote_slot's restart_lsn lags
> > behind the oldest WAL segment on standby. Now, in the test, we do
> > ensure that the publisher and subscriber are caught up by following
> > steps:
> > # Enable the subscription to let it catch up to the latest wal
> > position $subscriber1->safe_psql('postgres',
> > "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
> >
> > $primary->wait_for_catchup('regress_mysub1');
> >
> > However, this doesn't guarantee that restart_lsn is moved to a
> > position new enough that standby has a WAL corresponding to it.
> >
> 
> To ensure that restart_lsn has been moved to a recent position, we need to log
> XLOG_RUNNING_XACTS and make sure the same is processed as well by
> walsender. The attached patch does the required change.
> 
> Hou-San can reproduce this problem by adding additional checkpoints in the
> test and after applying the attached it fixes the problem. Now, this patch is
> mostly based on the theory we formed based on LOGs on BF and a reproducer
> by Hou-San, so still, there is some chance that this doesn't fix the BF failures in
> which case I'll again look into those.

I have verified that the patch can fix the issue on my machine(after adding few
more checkpoints before slot invalidation test.) I also added one more check in
the test to confirm the synced slot is not temp slot. Here is the v2 patch.

Best Regards,
Hou zj

Attachment

v2-0001-fix-040-standby-failover-slots-sync.patch

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

14 February 2024, 13:56:15

Hi,

On Wed, Feb 14, 2024 at 10:40:11AM +0000, Zhijie Hou (Fujitsu) wrote:
> On Wednesday, February 14, 2024 6:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > 
> > On Wed, Feb 14, 2024 at 2:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Feb 14, 2024 at 9:34 AM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > Here is V87 patch that adds test for the suggested cases.
> > > >
> > >
> > > I have pushed this patch and it leads to a BF failure:
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&d
> > > t=2024-02-14%2004%3A43%3A37
> > >
> > > The test failures are:
> > > #   Failed test 'logical decoding is not allowed on synced slot'
> > > #   at
> > /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_f
> > ailover_slots_sync.pl
> > > line 272.
> > > #   Failed test 'synced slot on standby cannot be altered'
> > > #   at
> > /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_f
> > ailover_slots_sync.pl
> > > line 281.
> > > #   Failed test 'synced slot on standby cannot be dropped'
> > > #   at
> > /home/bf/bf-build/flaviventris/HEAD/pgsql/src/test/recovery/t/040_standby_f
> > ailover_slots_sync.pl
> > > line 287.
> > >
> > > The reason is that in LOGs, we see a different ERROR message than what
> > > is expected:
> > > 2024-02-14 04:52:32.916 UTC [1767765][client backend][3/4:0] ERROR:
> > > replication slot "lsub1_slot" is active for PID 1760871
> > >
> > > Now, we see the slot still active because a test before these tests (#
> > > Test that if the synchronized slot is invalidated while the remote
> > > slot is still valid, ....) is not able to successfully persist the
> > > slot and the synced temporary slot remains active.
> > >
> > > The reason is clear by referring to below standby LOGS:
> > >
> > > LOG:  connection authorized: user=bf database=postgres
> > > application_name=040_standby_failover_slots_sync.pl
> > > LOG:  statement: SELECT pg_sync_replication_slots();
> > > LOG:  dropped replication slot "lsub1_slot" of dbid 5
> > > STATEMENT:  SELECT pg_sync_replication_slots(); ...
> > > SELECT conflict_reason IS NULL AND synced FROM pg_replication_slots
> > > WHERE slot_name = 'lsub1_slot';
> > >
> > > In the above LOGs, we should ideally see: "newly created slot
> > > "lsub1_slot" is sync-ready now" after the "LOG:  dropped replication
> > > slot "lsub1_slot" of dbid 5" but lack of that means the test didn't
> > > accomplish what it was supposed to. Ideally, the same test should have
> > > failed but the pass criteria for the test failed to check whether the
> > > slot is persisted or not.
> > >
> > > The probable reason for failure is that remote_slot's restart_lsn lags
> > > behind the oldest WAL segment on standby. Now, in the test, we do
> > > ensure that the publisher and subscriber are caught up by following
> > > steps:
> > > # Enable the subscription to let it catch up to the latest wal
> > > position $subscriber1->safe_psql('postgres',
> > > "ALTER SUBSCRIPTION regress_mysub1 ENABLE");
> > >
> > > $primary->wait_for_catchup('regress_mysub1');
> > >
> > > However, this doesn't guarantee that restart_lsn is moved to a
> > > position new enough that standby has a WAL corresponding to it.
> > >
> > 
> > To ensure that restart_lsn has been moved to a recent position, we need to log
> > XLOG_RUNNING_XACTS and make sure the same is processed as well by
> > walsender. The attached patch does the required change.
> > 
> > Hou-San can reproduce this problem by adding additional checkpoints in the
> > test and after applying the attached it fixes the problem. Now, this patch is
> > mostly based on the theory we formed based on LOGs on BF and a reproducer
> > by Hou-San, so still, there is some chance that this doesn't fix the BF failures in
> > which case I'll again look into those.
> 
> I have verified that the patch can fix the issue on my machine(after adding few
> more checkpoints before slot invalidation test.) I also added one more check in
> the test to confirm the synced slot is not temp slot. Here is the v2 patch.

Thanks!

+# To ensure that restart_lsn has moved to a recent WAL position, we need
+# to log XLOG_RUNNING_XACTS and make sure the same is processed as well
+$primary->psql('postgres', "CHECKPOINT");

Instead of "CHECKPOINT" wouldn't a less heavy "SELECT pg_log_standby_snapshot();"
be enough?

Not a big deal but maybe we could do the change while modifying
040_standby_failover_slots_sync.pl in the next patch "Add a new slotsync worker".

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

15 February 2024, 02:48:59

On Wed, Feb 14, 2024 at 7:26 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Wed, Feb 14, 2024 at 10:40:11AM +0000, Zhijie Hou (Fujitsu) wrote:
> > On Wednesday, February 14, 2024 6:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > To ensure that restart_lsn has been moved to a recent position, we need to log
> > > XLOG_RUNNING_XACTS and make sure the same is processed as well by
> > > walsender. The attached patch does the required change.
> > >
> > > Hou-San can reproduce this problem by adding additional checkpoints in the
> > > test and after applying the attached it fixes the problem. Now, this patch is
> > > mostly based on the theory we formed based on LOGs on BF and a reproducer
> > > by Hou-San, so still, there is some chance that this doesn't fix the BF failures in
> > > which case I'll again look into those.
> >
> > I have verified that the patch can fix the issue on my machine(after adding few
> > more checkpoints before slot invalidation test.) I also added one more check in
> > the test to confirm the synced slot is not temp slot. Here is the v2 patch.
>
> Thanks!
>
> +# To ensure that restart_lsn has moved to a recent WAL position, we need
> +# to log XLOG_RUNNING_XACTS and make sure the same is processed as well
> +$primary->psql('postgres', "CHECKPOINT");
>
> Instead of "CHECKPOINT" wouldn't a less heavy "SELECT pg_log_standby_snapshot();"
> be enough?
>

Yeah, that would be enough. However, the test still fails randomly due
to the same reason. See [1]. So, as mentioned yesterday, now, I feel
it is better to recreate the subscription/slot so that it can get the
latest restart_lsn rather than relying on pg_log_standby_snapshot() to
move it.

> Not a big deal but maybe we could do the change while modifying
> 040_standby_failover_slots_sync.pl in the next patch "Add a new slotsync worker".
>

Right, we can do that or probably this test would have made more sense
with a worker patch where we could wait for the slot to be synced.
Anyway, let's try to recreate the slot/subscription idea. BTW, do you
think that adding a LOG when we are not able to sync will help in
debugging such problems? I think eventually we can change it to DEBUG1
but for now, it can help with stabilizing BF and or some other
reported issues.

[1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-02-15%2000%3A14%3A38

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

15 February 2024, 03:35:34

On Thursday, February 15, 2024 10:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Wed, Feb 14, 2024 at 7:26 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Wed, Feb 14, 2024 at 10:40:11AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > On Wednesday, February 14, 2024 6:05 PM Amit Kapila
> <amit.kapila16@gmail.com> wrote:
> > > >
> > > > To ensure that restart_lsn has been moved to a recent position, we
> > > > need to log XLOG_RUNNING_XACTS and make sure the same is processed
> > > > as well by walsender. The attached patch does the required change.
> > > >
> > > > Hou-San can reproduce this problem by adding additional
> > > > checkpoints in the test and after applying the attached it fixes
> > > > the problem. Now, this patch is mostly based on the theory we
> > > > formed based on LOGs on BF and a reproducer by Hou-San, so still,
> > > > there is some chance that this doesn't fix the BF failures in which case I'll
> again look into those.
> > >
> > > I have verified that the patch can fix the issue on my machine(after
> > > adding few more checkpoints before slot invalidation test.) I also
> > > added one more check in the test to confirm the synced slot is not temp slot.
> Here is the v2 patch.
> >
> > Thanks!
> >
> > +# To ensure that restart_lsn has moved to a recent WAL position, we
> > +need # to log XLOG_RUNNING_XACTS and make sure the same is processed
> > +as well $primary->psql('postgres', "CHECKPOINT");
> >
> > Instead of "CHECKPOINT" wouldn't a less heavy "SELECT
> pg_log_standby_snapshot();"
> > be enough?
> >
> 
> Yeah, that would be enough. However, the test still fails randomly due to the
> same reason. See [1]. So, as mentioned yesterday, now, I feel it is better to
> recreate the subscription/slot so that it can get the latest restart_lsn rather than
> relying on pg_log_standby_snapshot() to move it.
> 
> > Not a big deal but maybe we could do the change while modifying
> > 040_standby_failover_slots_sync.pl in the next patch "Add a new slotsync
> worker".
> >
> 
> Right, we can do that or probably this test would have made more sense with a
> worker patch where we could wait for the slot to be synced.
> Anyway, let's try to recreate the slot/subscription idea. BTW, do you think that
> adding a LOG when we are not able to sync will help in debugging such
> problems? I think eventually we can change it to DEBUG1 but for now, it can help
> with stabilizing BF and or some other reported issues.

Here is the patch that attempts the re-create sub idea. I also think that a LOG/DEBUG
would be useful for such analysis, so the 0002 is to add such a log.

Best Regards,
Hou zj

Attachment

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

15 February 2024, 06:36:55

Hi,

Since the slotsync function is committed, I rebased remaining patches.
And here is the V88 patch set.

Best Regards,
Hou zj

On Thursday, February 15, 2024 5:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Feb 15, 2024 at 9:05 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > On Thursday, February 15, 2024 10:49 AM Amit Kapila
> <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Feb 14, 2024 at 7:26 PM Bertrand Drouvot
> > >
> > > Right, we can do that or probably this test would have made more
> > > sense with a worker patch where we could wait for the slot to be synced.
> > > Anyway, let's try to recreate the slot/subscription idea. BTW, do
> > > you think that adding a LOG when we are not able to sync will help
> > > in debugging such problems? I think eventually we can change it to
> > > DEBUG1 but for now, it can help with stabilizing BF and or some other
> reported issues.
> >
> > Here is the patch that attempts the re-create sub idea.
> >
> 
> Pushed this.
> 
> >
>  I also think that a LOG/DEBUG
> > would be useful for such analysis, so the 0002 is to add such a log.
> >
> 
> I feel such a LOG would be useful.
> 
> + ereport(LOG,
> + errmsg("waiting for remote slot \"%s\" LSN (%X/%X) and catalog xmin"
> +    " (%u) to pass local slot LSN (%X/%X) and catalog xmin (%u)",
> 
> I think waiting is a bit misleading here, how about something like:
> "could not sync slot information as remote slot precedes local slot:
> remote slot \"%s\": LSN (%X/%X), catalog xmin (%u) local slot: LSN (%X/%X),
> catalog xmin (%u)"

Changed.

Attach the v2 patch here. 

Apart from the new log message. I think we can add one more debug message in
reserve_wal_for_local_slot, this could be useful to analyze the failure. And we
can also enable the DEBUG log in the 040 tap-test, I see we have similar
setting in 010_logical_decoding_timline and logging debug1 message doesn't
increase noticable time on my machine. These are done in 0002.

Best Regards,
Hou zj

On Thu, Feb 15, 2024 at 12:07 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Since the slotsync function is committed, I rebased remaining patches.
> And here is the V88 patch set.
>

Please find the improvements in some of the comments in v88_0001*
attached. Kindly include these in next version, if you are okay with
it.

--
With Regards,
Amit Kapila.

Attachment

v88_amit_1.patch.txt

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

15 February 2024, 17:18:51

Hi,

On Thu, Feb 15, 2024 at 06:13:38PM +0530, Amit Kapila wrote:
> On Thu, Feb 15, 2024 at 12:07 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > Since the slotsync function is committed, I rebased remaining patches.
> > And here is the V88 patch set.
> >

Thanks!

> 
> Please find the improvements in some of the comments in v88_0001*
> attached. Kindly include these in next version, if you are okay with
> it.

Looking at v88_0001, random comments:

1 ===

Commit message "Be enabling slot synchronization"

Typo? s:Be/By

2 ===

+        It enables a physical standby to synchronize logical failover slots
+        from the primary server so that logical subscribers are not blocked
+        after failover.

Not sure "not blocked" is the right wording.
"can be resumed from the new primary" maybe? (was discussed in [1])

3 ===

+#define SlotSyncWorkerAllowed()        \
+       (sync_replication_slots && pmState == PM_HOT_STANDBY && \
+        SlotSyncWorkerCanRestart())

Maybe add a comment above the macro explaining the logic?

4 ===

+#include "replication/walreceiver.h"
 #include "replication/slotsync.h"

should be reverse order?

5 ===

+       if (SlotSyncWorker->syncing)
        {
-               SpinLockRelease(&SlotSyncCtx->mutex);
+               SpinLockRelease(&SlotSyncWorker->mutex);
                ereport(ERROR,
                                errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                                errmsg("cannot synchronize replication slots concurrently"));
        }

worth to add a test in 040_standby_failover_slots_sync.pl for it?

6 ===

+static void
+slotsync_reread_config(bool restart)
+{

worth to add test(s) in 040_standby_failover_slots_sync.pl for it?

[1]: https://www.postgresql.org/message-id/CAA4eK1JcBG6TJ3o5iUd4z0BuTbciLV3dK4aKgb7OgrNGoLcfSQ%40mail.gmail.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

15 February 2024, 17:24:23

Hi,

On Thu, Feb 15, 2024 at 05:58:47PM +0530, Amit Kapila wrote:
> On Thu, Feb 15, 2024 at 5:46 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > Also I was thinking: what about adding an output to pg_sync_replication_slots()?
> > The output could be the number of sync slots that have been created and are
> > not considered as sync-ready during the execution.
> >
> 
> Yeah, we can consider outputting some information via this function
> like how many slots are synced and persisted but not sure what would
> be appropriate here. Because one can anyway find that or more
> information by querying pg_replication_slots.

Right, so maybe just return a bool that would indicate that at least one new
created slot(s) is/are not sync-ready? (If so, then the details could be found in
pg_replication_slots).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

16 February 2024, 00:32:45

On Thursday, February 15, 2024 8:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

> 
> On Thu, Feb 15, 2024 at 5:46 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Thu, Feb 15, 2024 at 05:00:18PM +0530, Amit Kapila wrote:
> > > On Thu, Feb 15, 2024 at 4:29 PM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > > > Attach the v2 patch here.
> > > >
> > > > Apart from the new log message. I think we can add one more debug
> > > > message in reserve_wal_for_local_slot, this could be useful to analyze the
> failure.
> > >
> > > Yeah, that can also be helpful, but the added message looks naive to me.
> > > + elog(DEBUG1, "segno: %ld oldest_segno: %ld", oldest_segno, segno);
> > >
> > > Instead of the above, how about something like: "segno: %ld of
> > > purposed restart_lsn for the synced slot, oldest_segno: %ld
> > > available"?
> >
> > Looks good to me. I'm not sure if it would make more sense to elog
> > only if segno < oldest_segno means just before the
> XLogSegNoOffsetToRecPtr() call?
> >
> > But I'm fine with the proposed location too.
> >
> 
> I am also fine either way but the current location gives required information in
> more number of cases and could be helpful in debugging this new facility.
> 
> > >
> > > > And we
> > > > can also enable the DEBUG log in the 040 tap-test, I see we have
> > > > similar setting in 010_logical_decoding_timline and logging debug1
> > > > message doesn't increase noticable time on my machine. These are done
> in 0002.
> > > >
> > >
> > > I haven't tested it but I think this can help in debugging BF
> > > failures, if any. I am not sure if to keep it always like that but
> > > till the time these tests are stabilized, this sounds like a good
> > > idea. So, how, about just making test changes as a separate patch so
> > > that later if required we can revert/remove it easily? Bertrand, do
> > > you have any thoughts on this?
> >
> > +1 on having DEBUG log in the 040 tap-test until it's stabilized (I
> > +think we
> > took the same approach for 035_standby_logical_decoding.pl IIRC) and
> > then revert it back.
> >
> 
> Good to know!
> 
> > Also I was thinking: what about adding an output to
> pg_sync_replication_slots()?
> > The output could be the number of sync slots that have been created
> > and are not considered as sync-ready during the execution.
> >
> 
> Yeah, we can consider outputting some information via this function like how
> many slots are synced and persisted but not sure what would be appropriate
> here. Because one can anyway find that or more information by querying
> pg_replication_slots. I think we can keep discussing what makes more sense as a
> return value but let's commit the debug/log patches as they will be helpful to
> analyze BF failures or any other issues reported.

Agreed. Here is new patch set as suggested. I used debug2 in the 040 as it
could provide more information about communication between primary and standby.
This also doesn't increase noticeable testing time on my machine for debug
build.

Best Regards,
Hou zj

Attachment

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

16 February 2024, 05:41:55

On Friday, February 16, 2024 8:33 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> On Thursday, February 15, 2024 8:29 PM Amit Kapila
> <amit.kapila16@gmail.com> wrote:
> 
> >
> > On Thu, Feb 15, 2024 at 5:46 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > On Thu, Feb 15, 2024 at 05:00:18PM +0530, Amit Kapila wrote:
> > > > On Thu, Feb 15, 2024 at 4:29 PM Zhijie Hou (Fujitsu)
> > > > <houzj.fnst@fujitsu.com> wrote:
> > > > > Attach the v2 patch here.
> > > > >
> > > > > Apart from the new log message. I think we can add one more
> > > > > debug message in reserve_wal_for_local_slot, this could be
> > > > > useful to analyze the
> > failure.
> > > >
> > > > Yeah, that can also be helpful, but the added message looks naive to me.
> > > > + elog(DEBUG1, "segno: %ld oldest_segno: %ld", oldest_segno,
> > > > + segno);
> > > >
> > > > Instead of the above, how about something like: "segno: %ld of
> > > > purposed restart_lsn for the synced slot, oldest_segno: %ld
> > > > available"?
> > >
> > > Looks good to me. I'm not sure if it would make more sense to elog
> > > only if segno < oldest_segno means just before the
> > XLogSegNoOffsetToRecPtr() call?
> > >
> > > But I'm fine with the proposed location too.
> > >
> >
> > I am also fine either way but the current location gives required
> > information in more number of cases and could be helpful in debugging this
> new facility.
> >
> > > >
> > > > > And we
> > > > > can also enable the DEBUG log in the 040 tap-test, I see we have
> > > > > similar setting in 010_logical_decoding_timline and logging
> > > > > debug1 message doesn't increase noticable time on my machine.
> > > > > These are done
> > in 0002.
> > > > >
> > > >
> > > > I haven't tested it but I think this can help in debugging BF
> > > > failures, if any. I am not sure if to keep it always like that but
> > > > till the time these tests are stabilized, this sounds like a good
> > > > idea. So, how, about just making test changes as a separate patch
> > > > so that later if required we can revert/remove it easily?
> > > > Bertrand, do you have any thoughts on this?
> > >
> > > +1 on having DEBUG log in the 040 tap-test until it's stabilized (I
> > > +think we
> > > took the same approach for 035_standby_logical_decoding.pl IIRC) and
> > > then revert it back.
> > >
> >
> > Good to know!
> >
> > > Also I was thinking: what about adding an output to
> > pg_sync_replication_slots()?
> > > The output could be the number of sync slots that have been created
> > > and are not considered as sync-ready during the execution.
> > >
> >
> > Yeah, we can consider outputting some information via this function
> > like how many slots are synced and persisted but not sure what would
> > be appropriate here. Because one can anyway find that or more
> > information by querying pg_replication_slots. I think we can keep
> > discussing what makes more sense as a return value but let's commit
> > the debug/log patches as they will be helpful to analyze BF failures or any
> other issues reported.
> 
> Agreed. Here is new patch set as suggested. I used debug2 in the 040 as it could
> provide more information about communication between primary and standby.
> This also doesn't increase noticeable testing time on my machine for debug
> build.

Sorry, there was a miss in the DEBUG message, I should have used
UINT64_FORMAT for XLogSegNo(uint64) instead of %ld. Here is a small patch
to fix this.

Best Regards,
Hou zj

Attachment

0001-fix-the-format-for-uint64.patch

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

16 February 2024, 06:13:54

On Fri, Feb 16, 2024 at 11:12 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Friday, February 16, 2024 8:33 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> > >
> > > Yeah, we can consider outputting some information via this function
> > > like how many slots are synced and persisted but not sure what would
> > > be appropriate here. Because one can anyway find that or more
> > > information by querying pg_replication_slots. I think we can keep
> > > discussing what makes more sense as a return value but let's commit
> > > the debug/log patches as they will be helpful to analyze BF failures or any
> > other issues reported.
> >
> > Agreed. Here is new patch set as suggested. I used debug2 in the 040 as it could
> > provide more information about communication between primary and standby.
> > This also doesn't increase noticeable testing time on my machine for debug
> > build.
>
> Sorry, there was a miss in the DEBUG message, I should have used
> UINT64_FORMAT for XLogSegNo(uint64) instead of %ld. Here is a small patch
> to fix this.
>

Thanks for noticing this. I have pushed all your debug patches. Let's
hope if there is a BF failure next time, we can gather enough
information to know the reason of the same.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

16 February 2024, 06:47:24

Hi,

On Fri, Feb 16, 2024 at 12:32:45AM +0000, Zhijie Hou (Fujitsu) wrote:
> Agreed. Here is new patch set as suggested. I used debug2 in the 040 as it
> could provide more information about communication between primary and standby.
> This also doesn't increase noticeable testing time on my machine for debug
> build.

Same here, and there is no big diff as far the amount of log generated:

Without the patch:

$ du -sh ./src/test/recovery/tmp_check/log/*040*
4.0K    ./src/test/recovery/tmp_check/log/040_standby_failover_slots_sync_cascading_standby.log
24K     ./src/test/recovery/tmp_check/log/040_standby_failover_slots_sync_publisher.log
16K     ./src/test/recovery/tmp_check/log/040_standby_failover_slots_sync_standby1.log
4.0K    ./src/test/recovery/tmp_check/log/040_standby_failover_slots_sync_subscriber1.log
12K     ./src/test/recovery/tmp_check/log/regress_log_040_standby_failover_slots_sync

With the patch:

$ du -sh ./src/test/recovery/tmp_check/log/*040*
4.0K    ./src/test/recovery/tmp_check/log/040_standby_failover_slots_sync_cascading_standby.log
36K     ./src/test/recovery/tmp_check/log/040_standby_failover_slots_sync_publisher.log
48K     ./src/test/recovery/tmp_check/log/040_standby_failover_slots_sync_standby1.log
4.0K    ./src/test/recovery/tmp_check/log/040_standby_failover_slots_sync_subscriber1.log
12K     ./src/test/recovery/tmp_check/log/regress_log_040_standby_failover_slots_sync

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

16 February 2024, 07:42:31

On Fri, Feb 16, 2024 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Thanks for noticing this. I have pushed all your debug patches. Let's
> hope if there is a BF failure next time, we can gather enough
> information to know the reason of the same.
>

There is a new BF failure [1] after adding these LOGs and I think I
know what is going wrong. First, let's look at standby LOGs:

2024-02-16 06:18:18.442 UTC [241414][client backend][2/14:0] DEBUG:
segno: 4 of purposed restart_lsn for the synced slot, oldest_segno: 4
available
2024-02-16 06:18:18.443 UTC [241414][client backend][2/14:0] DEBUG:
xmin required by slots: data 0, catalog 741
2024-02-16 06:18:18.443 UTC [241414][client backend][2/14:0] LOG:mote
 could not sync slot information as reslot precedes local slot: remote
slot "lsub1_slot": LSN (0/4000168), catalog xmin (739) local slot: LSN
(0/4000168), catalog xmin (741)

So, from the above LOG, it is clear that the remote slot's catalog
xmin (739) precedes the local catalog xmin (741) which makes the sync
on standby to not complete.

Next, let's look at the LOG from the primary during the nearby time:
2024-02-16 06:18:11.354 UTC [238037][autovacuum worker][5/17:0] DEBUG:
 analyzing "pg_catalog.pg_depend"
2024-02-16 06:18:11.360 UTC [238037][autovacuum worker][5/17:0] DEBUG:
 "pg_depend": scanned 13 of 13 pages, containing 1709 live rows and 0
dead rows; 1709 rows in sample, 1709 estimated total rows
...
2024-02-16 06:18:11.372 UTC [238037][autovacuum worker][5/0:0] DEBUG:
Autovacuum VacuumUpdateCosts(db=1, rel=14050, dobalance=yes,
cost_limit=200, cost_delay=2 active=yes failsafe=no)
2024-02-16 06:18:11.372 UTC [238037][autovacuum worker][5/19:0] DEBUG:
 analyzing "information_schema.sql_features"
2024-02-16 06:18:11.377 UTC [238037][autovacuum worker][5/19:0] DEBUG:
 "sql_features": scanned 8 of 8 pages, containing 756 live rows and 0
dead rows; 756 rows in sample, 756 estimated total rows

It shows us that autovacuum worker has analyzed catalog table and for
updating its statistics in pg_statistic table, it would have acquired
a new transaction id. Now, after the slot creation, a new transaction
id that has updated the catalog is generated on primary and would have
been replication to standby. Due to this catalog_xmin of primary's
slot would precede standby's catalog_xmin and we see this failure.

As per this theory, we should disable autovacuum on primary to avoid
updates to catalog_xmin values.

[1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2024-02-16%2006%3A12%3A59

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

16 February 2024, 08:00:41

Hi,

On Fri, Feb 16, 2024 at 01:12:31PM +0530, Amit Kapila wrote:
> On Fri, Feb 16, 2024 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Thanks for noticing this. I have pushed all your debug patches. Let's
> > hope if there is a BF failure next time, we can gather enough
> > information to know the reason of the same.
> >
> 
> There is a new BF failure [1] after adding these LOGs and I think I
> know what is going wrong. First, let's look at standby LOGs:
> 
> 2024-02-16 06:18:18.442 UTC [241414][client backend][2/14:0] DEBUG:
> segno: 4 of purposed restart_lsn for the synced slot, oldest_segno: 4
> available
> 2024-02-16 06:18:18.443 UTC [241414][client backend][2/14:0] DEBUG:
> xmin required by slots: data 0, catalog 741
> 2024-02-16 06:18:18.443 UTC [241414][client backend][2/14:0] LOG:mote
>  could not sync slot information as reslot precedes local slot: remote
> slot "lsub1_slot": LSN (0/4000168), catalog xmin (739) local slot: LSN
> (0/4000168), catalog xmin (741)
> 
> So, from the above LOG, it is clear that the remote slot's catalog
> xmin (739) precedes the local catalog xmin (741) which makes the sync
> on standby to not complete.

Yeah, catalog_xmin was the other suspect (with restart_lsn) and agree it is the
culprit here.

> Next, let's look at the LOG from the primary during the nearby time:
> 2024-02-16 06:18:11.354 UTC [238037][autovacuum worker][5/17:0] DEBUG:
>  analyzing "pg_catalog.pg_depend"
> 2024-02-16 06:18:11.360 UTC [238037][autovacuum worker][5/17:0] DEBUG:
>  "pg_depend": scanned 13 of 13 pages, containing 1709 live rows and 0
> dead rows; 1709 rows in sample, 1709 estimated total rows
> ...
> 2024-02-16 06:18:11.372 UTC [238037][autovacuum worker][5/0:0] DEBUG:
> Autovacuum VacuumUpdateCosts(db=1, rel=14050, dobalance=yes,
> cost_limit=200, cost_delay=2 active=yes failsafe=no)
> 2024-02-16 06:18:11.372 UTC [238037][autovacuum worker][5/19:0] DEBUG:
>  analyzing "information_schema.sql_features"
> 2024-02-16 06:18:11.377 UTC [238037][autovacuum worker][5/19:0] DEBUG:
>  "sql_features": scanned 8 of 8 pages, containing 756 live rows and 0
> dead rows; 756 rows in sample, 756 estimated total rows
> 
> It shows us that autovacuum worker has analyzed catalog table and for
> updating its statistics in pg_statistic table, it would have acquired
> a new transaction id. Now, after the slot creation, a new transaction
> id that has updated the catalog is generated on primary and would have
> been replication to standby. Due to this catalog_xmin of primary's
> slot would precede standby's catalog_xmin and we see this failure.
> 
> As per this theory, we should disable autovacuum on primary to avoid
> updates to catalog_xmin values.

Makes sense to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

16 February 2024, 08:22:20

On Friday, February 16, 2024 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Fri, Feb 16, 2024 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Thanks for noticing this. I have pushed all your debug patches. Let's
> > hope if there is a BF failure next time, we can gather enough
> > information to know the reason of the same.
> >
> 
> There is a new BF failure [1] after adding these LOGs and I think I know what is
> going wrong. First, let's look at standby LOGs:
> 
> 2024-02-16 06:18:18.442 UTC [241414][client backend][2/14:0] DEBUG:
> segno: 4 of purposed restart_lsn for the synced slot, oldest_segno: 4 available
> 2024-02-16 06:18:18.443 UTC [241414][client backend][2/14:0] DEBUG:
> xmin required by slots: data 0, catalog 741
> 2024-02-16 06:18:18.443 UTC [241414][client backend][2/14:0] LOG:mote  could
> not sync slot information as reslot precedes local slot: remote slot "lsub1_slot":
> LSN (0/4000168), catalog xmin (739) local slot: LSN (0/4000168), catalog xmin
> (741)
> 
> So, from the above LOG, it is clear that the remote slot's catalog xmin (739)
> precedes the local catalog xmin (741) which makes the sync on standby to not
> complete.
> 
> Next, let's look at the LOG from the primary during the nearby time:
> 2024-02-16 06:18:11.354 UTC [238037][autovacuum worker][5/17:0] DEBUG:
>  analyzing "pg_catalog.pg_depend"
> 2024-02-16 06:18:11.360 UTC [238037][autovacuum worker][5/17:0] DEBUG:
>  "pg_depend": scanned 13 of 13 pages, containing 1709 live rows and 0 dead
> rows; 1709 rows in sample, 1709 estimated total rows ...
> 2024-02-16 06:18:11.372 UTC [238037][autovacuum worker][5/0:0] DEBUG:
> Autovacuum VacuumUpdateCosts(db=1, rel=14050, dobalance=yes,
> cost_limit=200, cost_delay=2 active=yes failsafe=no)
> 2024-02-16 06:18:11.372 UTC [238037][autovacuum worker][5/19:0] DEBUG:
>  analyzing "information_schema.sql_features"
> 2024-02-16 06:18:11.377 UTC [238037][autovacuum worker][5/19:0] DEBUG:
>  "sql_features": scanned 8 of 8 pages, containing 756 live rows and 0 dead rows;
> 756 rows in sample, 756 estimated total rows
> 
> It shows us that autovacuum worker has analyzed catalog table and for updating
> its statistics in pg_statistic table, it would have acquired a new transaction id. Now,
> after the slot creation, a new transaction id that has updated the catalog is
> generated on primary and would have been replication to standby. Due to this
> catalog_xmin of primary's slot would precede standby's catalog_xmin and we see
> this failure.
> 
> As per this theory, we should disable autovacuum on primary to avoid updates to
> catalog_xmin values.
> 

Agreed. Here is the patch to disable autovacuum in the test.

Best Regards,
Hou zj

Attachment

0001-Disable-autovacuum-on-primary-to-stabilize-the-040_s.patch

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

16 February 2024, 10:40:35

On Thu, Feb 15, 2024 at 10:48 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Looking at v88_0001, random comments:

Thanks for the feedback.

>
> 1 ===
>
> Commit message "Be enabling slot synchronization"
>
> Typo? s:Be/By

Modified.

> 2 ===
>
> +        It enables a physical standby to synchronize logical failover slots
> +        from the primary server so that logical subscribers are not blocked
> +        after failover.
>
> Not sure "not blocked" is the right wording.
> "can be resumed from the new primary" maybe? (was discussed in [1])

Modified.

> 3 ===
>
> +#define SlotSyncWorkerAllowed()        \
> +       (sync_replication_slots && pmState == PM_HOT_STANDBY && \
> +        SlotSyncWorkerCanRestart())
>
> Maybe add a comment above the macro explaining the logic?

Done.

> 4 ===
>
> +#include "replication/walreceiver.h"
>  #include "replication/slotsync.h"
>
> should be reverse order?

Removed walreceiver.h inclusion as it was not needed.

> 5 ===
>
> +       if (SlotSyncWorker->syncing)
>         {
> -               SpinLockRelease(&SlotSyncCtx->mutex);
> +               SpinLockRelease(&SlotSyncWorker->mutex);
>                 ereport(ERROR,
>                                 errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                                 errmsg("cannot synchronize replication slots concurrently"));
>         }
>
> worth to add a test in 040_standby_failover_slots_sync.pl for it?

It will be very difficult to stabilize this test as we have to make
sure that the concurrent users (SQL function(s) and/or worker(s)) are
in that target function at the same time to hit it.

>
> 6 ===
>
> +static void
> +slotsync_reread_config(bool restart)
> +{
>
> worth to add test(s) in 040_standby_failover_slots_sync.pl for it?

Added test.

Please find v89  patch set. The other changes are:

patch001:
1) Addressed some comments by Amit and Ajin given off-list.
2) Removed redundant header inclusions from slotsync.c.
3) Corrected the value returned by validate_remote_info().
4) Restructured code around validate_remote_info.
5) Improved comments and commit msg.

patch002:
Rebased it.

thanks
Shveta

On Mon, Feb 19, 2024 at 9:32 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
>
> I understand that it will not be used, but some complier could report WARNING
> for the un-initialized variable. The cfbot[1] seems complain about this as well.
>
> [1] https://cirrus-ci.com/task/5416851522453504

Okay I see. Thanks for pointing it out. Here are the patches
addressing your comments. Changes are in patch001, rest are rebased.

thanks
Shveta

Attachment

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

19 February 2024, 07:53:17

Hi,

On Sat, Feb 17, 2024 at 10:10:18AM +0530, Amit Kapila wrote:
> On Fri, Feb 16, 2024 at 4:10 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Feb 15, 2024 at 10:48 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> >
> > > 5 ===
> > >
> > > +       if (SlotSyncWorker->syncing)
> > >         {
> > > -               SpinLockRelease(&SlotSyncCtx->mutex);
> > > +               SpinLockRelease(&SlotSyncWorker->mutex);
> > >                 ereport(ERROR,
> > >                                 errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > >                                 errmsg("cannot synchronize replication slots concurrently"));
> > >         }
> > >
> > > worth to add a test in 040_standby_failover_slots_sync.pl for it?
> >
> > It will be very difficult to stabilize this test as we have to make
> > sure that the concurrent users (SQL function(s) and/or worker(s)) are
> > in that target function at the same time to hit it.
> >
> 
> Yeah, I also think would be tricky to write a stable test, maybe one
> can explore using a new injection point facility but I don't think it
> is worth for this error check as this appears straightforward to be
> broken in the future by other changes.

Yeah, injection point would probably be the way to go. Agree that's probably
not worth adding such a test (we can change our mind later on if needed anyway).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

19 February 2024, 12:01:53

On Mon, Feb 19, 2024 at 9:46 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> Okay I see. Thanks for pointing it out. Here are the patches
> addressing your comments. Changes are in patch001, rest are rebased.
>

Few comments on 0001
====================
1. I think it is better to error out when the valid GUC or option is
not set in ensure_valid_slotsync_params() and
ensure_valid_remote_info() instead of waiting. And we shouldn't start
the worker in the first place if not all required GUCs are set. This
will additionally simplify the code a bit.

2.
+typedef struct SlotSyncWorkerCtxStruct
 {
- /* prevents concurrent slot syncs to avoid slot overwrites */
+ pid_t pid;
+ bool stopSignaled;
  bool syncing;
+ time_t last_start_time;
  slock_t mutex;
-} SlotSyncCtxStruct;
+} SlotSyncWorkerCtxStruct;

I think we don't need to change the name of this struct as this can be
used both by the worker and the backend. We can probably add the
comment to indicate that all the fields except 'syncing' are used by
slotsync worker.

3. Similar to above, function names like SlotSyncShmemInit() shouldn't
be changed to SlotSyncWorkerShmemInit().

4.
+ReplSlotSyncWorkerMain(int argc, char *argv[])
{
...
+ on_shmem_exit(slotsync_worker_onexit, (Datum) 0);
...
+ before_shmem_exit(slotsync_failure_callback, PointerGetDatum(wrconn));
...
}

Do we need two separate callbacks? Can't we have just one (say
slotsync_failure_callback) that cleans additional things in case of
slotsync worker? And, if we need both those callbacks then please add
some comments for both and why one is before_shmem_exit() and the
other is on_shmem_exit().

In addition to the above, I have made a few changes in the comments
and code (cosmetic code changes). Please include those in the next
version if you find them okay. You need to rename .txt file to remove
.txt and apply atop v90-0001*.

--
With Regards,
Amit Kapila.

Attachment

v90_0001_amit.patch.txt

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

19 February 2024, 12:59:14

On Mon, Feb 19, 2024 at 5:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Few comments on 0001

Thanks for the feedback.

> ====================
> 1. I think it is better to error out when the valid GUC or option is
> not set in ensure_valid_slotsync_params() and
> ensure_valid_remote_info() instead of waiting. And we shouldn't start
> the worker in the first place if not all required GUCs are set. This
> will additionally simplify the code a bit.

Sure, removed 'ensure' functions. Moved the validation checks to the
postmaster before starting the slot sync worker.

> 2.
> +typedef struct SlotSyncWorkerCtxStruct
>  {
> - /* prevents concurrent slot syncs to avoid slot overwrites */
> + pid_t pid;
> + bool stopSignaled;
>   bool syncing;
> + time_t last_start_time;
>   slock_t mutex;
> -} SlotSyncCtxStruct;
> +} SlotSyncWorkerCtxStruct;
>
> I think we don't need to change the name of this struct as this can be
> used both by the worker and the backend. We can probably add the
> comment to indicate that all the fields except 'syncing' are used by
> slotsync worker.

Modified.

> 3. Similar to above, function names like SlotSyncShmemInit() shouldn't
> be changed to SlotSyncWorkerShmemInit().

Modified.

> 4.
> +ReplSlotSyncWorkerMain(int argc, char *argv[])
> {
> ...
> + on_shmem_exit(slotsync_worker_onexit, (Datum) 0);
> ...
> + before_shmem_exit(slotsync_failure_callback, PointerGetDatum(wrconn));
> ...
> }
>
> Do we need two separate callbacks? Can't we have just one (say
> slotsync_failure_callback) that cleans additional things in case of
> slotsync worker? And, if we need both those callbacks then please add
> some comments for both and why one is before_shmem_exit() and the
> other is on_shmem_exit().

I think we can merge these now. Earlier 'on_shmem_exit' was needed to
avoid race-condition between startup and slot sync worker process to
drop 'i' slots on promotion.  Now we do not have any such scenario.
But I need some time to analyze it well. Will do it in the next
version.

> In addition to the above, I have made a few changes in the comments
> and code (cosmetic code changes). Please include those in the next
> version if you find them okay. You need to rename .txt file to remove
> .txt and apply atop v90-0001*.

Sure, included these.

Please find the patch001 attached. I will rebase the rest of the
patches and post them tomorrow.

thanks
Shveta

Attachment

v91-0001-Add-a-new-slotsync-worker.patch

Re: Synchronizing slots from primary to standby

From

Masahiko Sawada

Date:

20 February 2024, 02:55:08

On Mon, Feb 19, 2024 at 9:59 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Feb 19, 2024 at 5:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Few comments on 0001
>
> Thanks for the feedback.
>
> > ====================
> > 1. I think it is better to error out when the valid GUC or option is
> > not set in ensure_valid_slotsync_params() and
> > ensure_valid_remote_info() instead of waiting. And we shouldn't start
> > the worker in the first place if not all required GUCs are set. This
> > will additionally simplify the code a bit.
>
> Sure, removed 'ensure' functions. Moved the validation checks to the
> postmaster before starting the slot sync worker.
>
> > 2.
> > +typedef struct SlotSyncWorkerCtxStruct
> >  {
> > - /* prevents concurrent slot syncs to avoid slot overwrites */
> > + pid_t pid;
> > + bool stopSignaled;
> >   bool syncing;
> > + time_t last_start_time;
> >   slock_t mutex;
> > -} SlotSyncCtxStruct;
> > +} SlotSyncWorkerCtxStruct;
> >
> > I think we don't need to change the name of this struct as this can be
> > used both by the worker and the backend. We can probably add the
> > comment to indicate that all the fields except 'syncing' are used by
> > slotsync worker.
>
> Modified.
>
> > 3. Similar to above, function names like SlotSyncShmemInit() shouldn't
> > be changed to SlotSyncWorkerShmemInit().
>
> Modified.
>
> > 4.
> > +ReplSlotSyncWorkerMain(int argc, char *argv[])
> > {
> > ...
> > + on_shmem_exit(slotsync_worker_onexit, (Datum) 0);
> > ...
> > + before_shmem_exit(slotsync_failure_callback, PointerGetDatum(wrconn));
> > ...
> > }
> >
> > Do we need two separate callbacks? Can't we have just one (say
> > slotsync_failure_callback) that cleans additional things in case of
> > slotsync worker? And, if we need both those callbacks then please add
> > some comments for both and why one is before_shmem_exit() and the
> > other is on_shmem_exit().
>
> I think we can merge these now. Earlier 'on_shmem_exit' was needed to
> avoid race-condition between startup and slot sync worker process to
> drop 'i' slots on promotion.  Now we do not have any such scenario.
> But I need some time to analyze it well. Will do it in the next
> version.
>
> > In addition to the above, I have made a few changes in the comments
> > and code (cosmetic code changes). Please include those in the next
> > version if you find them okay. You need to rename .txt file to remove
> > .txt and apply atop v90-0001*.
>
> Sure, included these.
>
> Please find the patch001 attached.

I've reviewed the v91 patch. Here are random comments:

---
 /*
  * Checks the remote server info.
  *
- * We ensure that the 'primary_slot_name' exists on the remote server and the
- * remote server is not a standby node.
+ * Check whether we are a cascading standby. For non-cascading standbys, it
+ * also ensures that the 'primary_slot_name' exists on the remote server.
  */

IIUC what the validate_remote_info() does doesn't not change by this
patch, so the previous comment seems to be clearer to me.

---
    if (remote_in_recovery)
+   {
+       /*
+        * If we are a cascading standby, no need to check further for
+        * 'primary_slot_name'.
+        */
        ereport(ERROR,
                errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                errmsg("cannot synchronize replication slots from a
standby server"));
+   }
+   else
+   {
+       bool        primary_slot_valid;

-   primary_slot_valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull));
-   Assert(!isnull);
+       primary_slot_valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull));
+       Assert(!isnull);

-   if (!primary_slot_valid)
-       ereport(ERROR,
-               errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-               errmsg("bad configuration for slot synchronization"),
-       /* translator: second %s is a GUC variable name */
-               errdetail("The replication slot \"%s\" specified by
\"%s\" does not exist on the primary server.",
-                         PrimarySlotName, "primary_slot_name"));
+       if (!primary_slot_valid)
+           ereport(ERROR,
+                   errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                   errmsg("bad configuration for slot synchronization"),
+           /* translator: second %s is a GUC variable name */
+                   errdetail("The replication slot \"%s\" specified
by \"%s\" does not exist on the primary server.",
+                             PrimarySlotName, "primary_slot_name"));
+   }

I think it's a refactoring rather than changes required by the
slotsync worker. We can do that in a separate patch but why do we need
this change in the first place?

---
+        ValidateSlotSyncParams(ERROR);
+
         /* Load the libpq-specific functions */
         load_file("libpqwalreceiver", false);

-        ValidateSlotSyncParams();
+        (void) CheckDbnameInConninfo();

Is there any reason why we move where to check the parameters?

Some comments not related to the patch but to the existing code:

---
It might have already been discussed but is the
src/backend/replication/logical the right place for the slocsync.c? If
it's independent of logical decoding/replication, is under
src/backend/replication could be more appropriate?

---
    /* Construct query to fetch slots with failover enabled. */
    appendStringInfo(&s,
                     "SELECT slot_name, plugin, confirmed_flush_lsn,"
                     " restart_lsn, catalog_xmin, two_phase, failover,"
                     " database, conflict_reason"
                     " FROM pg_catalog.pg_replication_slots"
                     " WHERE failover and NOT temporary");

    /* Execute the query */
    res = walrcv_exec(wrconn, s.data, SLOTSYNC_COLUMN_COUNT, slotRow);
    pfree(s.data);

We don't need 's' as the query is constant.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

20 February 2024, 03:33:27

On Tue, Feb 20, 2024 at 8:25 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>
> I've reviewed the v91 patch. Here are random comments:

Thanks for the comments.

> ---
>  /*
>   * Checks the remote server info.
>   *
> - * We ensure that the 'primary_slot_name' exists on the remote server and the
> - * remote server is not a standby node.
> + * Check whether we are a cascading standby. For non-cascading standbys, it
> + * also ensures that the 'primary_slot_name' exists on the remote server.
>   */
>
> IIUC what the validate_remote_info() does doesn't not change by this
> patch, so the previous comment seems to be clearer to me.
>
> ---
>     if (remote_in_recovery)
> +   {
> +       /*
> +        * If we are a cascading standby, no need to check further for
> +        * 'primary_slot_name'.
> +        */
>         ereport(ERROR,
>                 errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>                 errmsg("cannot synchronize replication slots from a
> standby server"));
> +   }
> +   else
> +   {
> +       bool        primary_slot_valid;
>
> -   primary_slot_valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull));
> -   Assert(!isnull);
> +       primary_slot_valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull));
> +       Assert(!isnull);
>
> -   if (!primary_slot_valid)
> -       ereport(ERROR,
> -               errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> -               errmsg("bad configuration for slot synchronization"),
> -       /* translator: second %s is a GUC variable name */
> -               errdetail("The replication slot \"%s\" specified by
> \"%s\" does not exist on the primary server.",
> -                         PrimarySlotName, "primary_slot_name"));
> +       if (!primary_slot_valid)
> +           ereport(ERROR,
> +                   errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                   errmsg("bad configuration for slot synchronization"),
> +           /* translator: second %s is a GUC variable name */
> +                   errdetail("The replication slot \"%s\" specified
> by \"%s\" does not exist on the primary server.",
> +                             PrimarySlotName, "primary_slot_name"));
> +   }
>
> I think it's a refactoring rather than changes required by the
> slotsync worker. We can do that in a separate patch but why do we need
> this change in the first place?

In v90, this refactoring was made due to the fact that
validate_remote_info() was supposed to behave differently for SQL
function and slot-sync worker. SQL-function was supposed to ERROR out
while the worker was supposed to LOG and become no-op. And thus the
change was needed. In v91, we made this functionality same i.e. both
sql function and worker will error out but missed to remove this
refactoring. Thanks for catching this, I will revert it in the next
version. To match the refactoring, I made the comment change too, will
revert that as well.

> ---
> +        ValidateSlotSyncParams(ERROR);
> +
>          /* Load the libpq-specific functions */
>          load_file("libpqwalreceiver", false);
>
> -        ValidateSlotSyncParams();
> +        (void) CheckDbnameInConninfo();
>
> Is there any reason why we move where to check the parameters?

Earlier DBname verification was done inside ValidateSlotSyncParams()
and thus it was needed to load 'libpqwalreceiver' before we call this
function. Now we have moved dbname verification in a separate call and
thus the above order got changed. ValidateSlotSyncParams() is a common
function used by SQL function and worker. Earlier slot sync worker was
checking all the GUCs after starting up and was exiting each time any
GUC was invalid. It was suggested that it would be better to check the
GUCs before starting the slot sync worker in the postmaster itself,
making the ValidateSlotSyncParams() move to postmaster (see
SlotSyncWorkerAllowed).  But it was not a good idea to load libpq in
postmaster and thus we moved libpq related verification out of
ValidateSlotSyncParams(). This resulted in the above change.  I hope
it answers your query.

thanks
Shveta

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

20 February 2024, 03:43:54

On Tue, Feb 20, 2024 at 8:25 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Some comments not related to the patch but to the existing code:
>
> ---
> It might have already been discussed but is the
> src/backend/replication/logical the right place for the slocsync.c? If
> it's independent of logical decoding/replication, is under
> src/backend/replication could be more appropriate?
>

This point has not been discussed, so thanks for raising it. I think
the reasoning behind keeping it in logical is that this file contains
a code for logical slot syncing and a worker doing that. As it is
mostly about logical slot syncing so there is an argument to keep it
under logical. In the future, we may need to extend this functionality
to have a per-db slot sync worker as well in which case it will
probably be again somewhat related to logical slots. Having said that,
there is an argument to keep it under replication as well because the
functionality it provides is for physical standbys.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

20 February 2024, 09:35:44

On Tue, Feb 20, 2024 at 8:25 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I've reviewed the v91 patch. Here are random comments:
>

Thanks for the comments, addressed in v92. slotsync.c file is still
under 'logical'. I am waiting for the discussion to be concluded.

v92 also addresses some off-list comments given by Amit and Hou-San.
The changes are in patch001, rest of the patches are rebased.

thanks
Shveta

On Tue, Feb 20, 2024 at 6:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Thank you for the explanation. It makes sense to me to move the check.
>
> As for ValidateSlotSyncParams() called by SlotSyncWorkerAllowed(), I
> have two comments:
>
> 1. The error messages are not very descriptive and seem not to match
> other messages the postmaster says. When starting the standby server
> with misconfiguration about the slotsync, I got the following messages
> from the postmaster:
>
> 2024-02-20 17:01:16.356 JST [456741] LOG:  database system is ready to
> accept read-only connections
> 2024-02-20 17:01:16.358 JST [456741] LOG:  bad configuration for slot
> synchronization
> 2024-02-20 17:01:16.358 JST [456741] HINT:  "hot_standby_feedback"
> must be enabled.
>
> It says "bad configuration" but is still working, and does not say
> further information such as whether it skipped to start the slotsync
> worker etc. I think these messages could work for the slotsync worker
> but we might want to have more descriptive messages for the
> postmaster. For example, "skipped starting slot sync worker because
> hot_standby_feedback is disabled".
>

We are planning to change it to something like:"slot synchronization
requires hot_standby_feedback to be enabled". See [1]

> 2. If the wal_level is not logical, the server will need to restart
> anyway to change the wal_level and have the slotsync worker work. Does
> it make sense to have the postmaster exit if the wal_level is not
> logical and sync_replication_slots is enabled? For instance, we have
> similar checks in PostmsaterMain():
>
>     if (summarize_wal && wal_level == WAL_LEVEL_MINIMAL)
>         ereport(ERROR,
>                 (errmsg("WAL cannot be summarized when wal_level is
> \"minimal\"")));
>

+1. I think giving an error in this case makes sense.

Miscellaneous comments:
========================
1.
+void
+ShutDownSlotSync(void)
+{
+ SpinLockAcquire(&SlotSyncCtx->mutex);
+
+ SlotSyncCtx->stopSignaled = true;

This flag is never reset back. I think we should reset this once the
promotion is complete. Though offhand, I don't see any problem with
this but it doesn't look clean and can be a source of bugs in the
future.

2.
+char *
+CheckDbnameInConninfo(void)
 {
  char    *dbname;

Let's name this function as CheckAndGetDbnameFromConninfo().

Apart from the above, I have made cosmetic changes in the attached.

[1] - https://www.postgresql.org/message-id/CAJpy0uBWomyAjP0zyFdzhGxn%2BXsAb2OdJA%2BKfNyZRv2nV6PD9g%40mail.gmail.com

--
With Regards,
Amit Kapila.

Attachment

v92_0001_amit.patch.txt

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

21 February 2024, 06:43:43

On Tue, Feb 20, 2024 at 6:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Miscellaneous comments:

Thanks for the comments.

> ========================
> 1.
> +void
> +ShutDownSlotSync(void)
> +{
> + SpinLockAcquire(&SlotSyncCtx->mutex);
> +
> + SlotSyncCtx->stopSignaled = true;
>
> This flag is never reset back. I think we should reset this once the
> promotion is complete. Though offhand, I don't see any problem with
> this but it doesn't look clean and can be a source of bugs in the
> future.

Did reset of flag in MaybeStartSlotSyncWorker() when we attempt to
start the worker after promotion completion and find that stopSignaled
is true while pmState is PM_RUN. From that point onwards, we can rely
on pmState to prevent the launch of the slot sync worker and thus can
reset stopSignaled.

> 2.
> +char *
> +CheckDbnameInConninfo(void)
>  {
>   char    *dbname;
>
> Let's name this function as CheckAndGetDbnameFromConninfo().

Modified.

> Apart from the above, I have made cosmetic changes in the attached.

Included these changes. Thanks.

Here are the v93 patches. It also addresses Swada-san's comment of
converting LOG to ERROR on receiving wal_level < logical.

I have also incorporated one more change wherein we check that
'Shutdown <= SmartShutdown' before launching the slot sync worker.
Since we do not need slot sync process to help in the rest of the
shutdown process, so better not to start it when shutdown (immediate
or fast) is going on. I have done this based on the details in [1]. It
is similar to WalReceiver behaviour.  Thoughts?

[1]: https://www.postgresql.org/message-id/flat/CAJpy0uCeQm2aFJLkx-D0BeAEvSdViTZf4wD7zT9coDHfLv1NaA%40mail.gmail.com

thanks
Shveta

On Wed, Feb 21, 2024 at 5:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> A few minor comments:

Thanks for the feedback.

> =================
> 1.
> +/*
> + * Is stopSignaled set in SlotSyncCtx?
> + */
> +bool
> +IsStopSignaledSet(void)
> +{
> + bool signaled;
> +
> + SpinLockAcquire(&SlotSyncCtx->mutex);
> + signaled = SlotSyncCtx->stopSignaled;
> + SpinLockRelease(&SlotSyncCtx->mutex);
> +
> + return signaled;
> +}
> +
> +/*
> + * Reset stopSignaled in SlotSyncCtx.
> + */
> +void
> +ResetStopSignaled(void)
> +{
> + SpinLockAcquire(&SlotSyncCtx->mutex);
> + SlotSyncCtx->stopSignaled = false;
> + SpinLockRelease(&SlotSyncCtx->mutex);
> +}
>
> I think these newly introduced functions don't need spinlock to be
> acquired as these are just one-byte read-and-write. Additionally, when
> IsStopSignaledSet() is invoked, there shouldn't be any concurrent
> process to update that value. What do you think?

Yes, we can avoid taking spinlock here. These functions are invoked
after checking that pmState is PM_RUN. And in that state we do not
expect any other process writing this flag.

> 2.
> +REPL_SLOTSYNC_MAIN "Waiting in main loop of slot sync worker."
> +REPL_SLOTSYNC_SHUTDOWN "Waiting for slot sync worker to shut down."
>
> Let's use REPLICATION instead of REPL. I see other wait events using
> REPLICATION in their names.

Modified.

> 3.
> - * In standalone mode and in autovacuum worker processes, we use a fixed
> - * ID, otherwise we figure it out from the authenticated user name.
> + * In standalone mode, autovacuum worker processes and slot sync worker
> + * process, we use a fixed ID, otherwise we figure it out from the
> + * authenticated user name.
> */
> - if (bootstrap || IsAutoVacuumWorkerProcess())
> + if (bootstrap || IsAutoVacuumWorkerProcess() || IsLogicalSlotSyncWorker())
> {
> InitializeSessionUserIdStandalone();
> am_superuser = true;
>
> IIRC, we discussed this previously and it is safe to make the local
> connection as superuser as we don't consult any user tables, so we can
> probably add a comment where we invoke InitPostgres in slotsync.c

Added comment. Thanks Hou-San for the analysis here and providing comment.

> 4.
> $publisher->safe_psql('postgres',
> - "CREATE PUBLICATION regress_mypub FOR ALL TABLES;");
> + "CREATE PUBLICATION regress_mypub FOR ALL TABLES;"
> +);
>
> Why this change is required in the patch?

Not needed, removed it.

> 5.
> +# Confirm that restart_lsn and of confirmed_flush_lsn lsub1_slot slot
> are synced
> +# to the standby
>
> /and of/; looks like a typo

Modified.

> 6.
> +# Confirm that restart_lsn and of confirmed_flush_lsn lsub1_slot slot
> are synced
> +# to the standby
> +ok( $standby1->poll_query_until(
> + 'postgres',
> + "SELECT '$primary_restart_lsn' = restart_lsn AND
> '$primary_flush_lsn' = confirmed_flush_lsn from pg_replication_slots
> WHERE slot_name = 'lsub1_slot';"),
> + 'restart_lsn and confirmed_flush_lsn of slot lsub1_slot synced to standby');
> +
> ...
> ...
> +# Confirm the synced slot 'lsub1_slot' is retained on the new primary
> +is($standby1->safe_psql('postgres',
> + q{SELECT slot_name FROM pg_replication_slots WHERE slot_name =
> 'lsub1_slot';}),
> + 'lsub1_slot',
> + 'synced slot retained on the new primary');
>
> In both these checks, we should additionally check the 'synced' and
> 'temporary' flags to ensure that they are marked appropriately.

Modified.

Please find patch001 attached. There is a CFBot failure in patch002.
The test added there needs some adjustment. We will rebase and post
rest of the patches once we fix that issue.

thanks
Shveta

Attachment

v94-0001-Add-a-new-slotsync-worker.patch

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

22 February 2024, 06:46:34

On Thu, Feb 22, 2024 at 10:31 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> Please find patch001 attached. There is a CFBot failure in patch002.
> The test added there needs some adjustment. We will rebase and post
> rest of the patches once we fix that issue.
>

There was a recent commit 801792e to improve error messaging in
slotsync.c which resulted in conflict. Thus rebased the patch. There
is no new change in the patch attached

thanks
Shveta

Attachment

v94_2-0001-Add-a-new-slotsync-worker.patch

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

22 February 2024, 10:14:36

Hi,

On Thu, Feb 22, 2024 at 12:16:34PM +0530, shveta malik wrote:
> On Thu, Feb 22, 2024 at 10:31 AM shveta malik <shveta.malik@gmail.com> wrote:
> There was a recent commit 801792e to improve error messaging in
> slotsync.c which resulted in conflict. Thus rebased the patch. There
> is no new change in the patch attached

Thanks!

Some random comments about v92_001 (Sorry if it has already been discussed
up-thread):

1 ===

+        * We do not update the 'synced' column from true to false here

Worth to mention from which system view the 'synced' column belongs to?

2 === (Nit)

+#define MIN_WORKER_NAPTIME_MS  200
+#define MAX_WORKER_NAPTIME_MS  30000   /* 30s */

[MIN|MAX]_SLOTSYNC_WORKER_NAPTIME_MS instead? It is used only in slotsync.c so
more a Nit.

3 ===

        res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
-
        if (res->status != WALRCV_OK_TUPLES)

Line removal intended?

4 ===

+       if (wal_level < WAL_LEVEL_LOGICAL)
+       {
+               ereport(ERROR,
+                               errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                               errmsg("slot synchronization requires wal_level >= \"logical\""));
+               return false;
+       }

I think the return is not needed here as it won't be reached due to the "ERROR".
Or should we use "elevel" instead of "ERROR"?

5 ===

+        * operate as a superuser. This is safe because the slot sync worker does
+        * not interact with user tables, eliminating the risk of executing
+        * arbitrary code within triggers.

Right. I did not check but if we are using operators in our remote SPI calls
then it would be worth to ensure they are coming from the pg_catalog schema?
Using something like "OPERATOR(pg_catalog.=)" using "=" as an example.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

22 February 2024, 10:31:34

On Thu, Feb 22, 2024 at 3:44 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> Thanks!
>
> Some random comments about v92_001 (Sorry if it has already been discussed
> up-thread):

Thanks for the feedback. The patch is pushed 15 minutes back. I will
prepare a top-up patch for your comments.

> 1 ===
>
> +        * We do not update the 'synced' column from true to false here
>
> Worth to mention from which system view the 'synced' column belongs to?

Sure, I will change it.

> 2 === (Nit)
>
> +#define MIN_WORKER_NAPTIME_MS  200
> +#define MAX_WORKER_NAPTIME_MS  30000   /* 30s */
>
> [MIN|MAX]_SLOTSYNC_WORKER_NAPTIME_MS instead? It is used only in slotsync.c so
> more a Nit.

Okay, will change it,

> 3 ===
>
>         res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
> -
>         if (res->status != WALRCV_OK_TUPLES)
>
> Line removal intended?

I feel the current style is better, where we do not have space between
the function call and return value checking.

> 4 ===
>
> +       if (wal_level < WAL_LEVEL_LOGICAL)
> +       {
> +               ereport(ERROR,
> +                               errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                               errmsg("slot synchronization requires wal_level >= \"logical\""));
> +               return false;
> +       }
>
> I think the return is not needed here as it won't be reached due to the "ERROR".
> Or should we use "elevel" instead of "ERROR"?

It was suggested to raise ERROR for wal_level validation, please see
[1]. But yes, I will  remove the return value. Thanks for catching
this.

> 5 ===
>
> +        * operate as a superuser. This is safe because the slot sync worker does
> +        * not interact with user tables, eliminating the risk of executing
> +        * arbitrary code within triggers.
>
> Right. I did not check but if we are using operators in our remote SPI calls
> then it would be worth to ensure they are coming from the pg_catalog schema?
> Using something like "OPERATOR(pg_catalog.=)" using "=" as an example.

Can you please elaborate this one, I am not sure if I understood it.

[1]: https://www.postgresql.org/message-id/CAD21AoB2ipSzQb5-o5pEYKie4oTPJTsYR1ip9_wRVrF6HbBWDQ%40mail.gmail.com

thanks
Shveta

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

22 February 2024, 11:05:19

Hi,

On Thu, Feb 22, 2024 at 04:01:34PM +0530, shveta malik wrote:
> On Thu, Feb 22, 2024 at 3:44 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > Thanks!
> >
> > Some random comments about v92_001 (Sorry if it has already been discussed
> > up-thread):
> 
> Thanks for the feedback. The patch is pushed 15 minutes back.

Yeah, saw that after I send the comments ;-)

> I will
> prepare a top-up patch for your comments.

Thanks!

> > 4 ===
> >
> > +       if (wal_level < WAL_LEVEL_LOGICAL)
> > +       {
> > +               ereport(ERROR,
> > +                               errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > +                               errmsg("slot synchronization requires wal_level >= \"logical\""));
> > +               return false;
> > +       }
> >
> > I think the return is not needed here as it won't be reached due to the "ERROR".
> > Or should we use "elevel" instead of "ERROR"?
> 
> It was suggested to raise ERROR for wal_level validation, please see
> [1]. But yes, I will  remove the return value.

Yeah, thanks, ERROR makes sense here.

> > 5 ===
> >
> > +        * operate as a superuser. This is safe because the slot sync worker does
> > +        * not interact with user tables, eliminating the risk of executing
> > +        * arbitrary code within triggers.
> >
> > Right. I did not check but if we are using operators in our remote SPI calls
> > then it would be worth to ensure they are coming from the pg_catalog schema?
> > Using something like "OPERATOR(pg_catalog.=)" using "=" as an example.
> 
> Can you please elaborate this one, I am not sure if I understood it.

Suppose that in synchronize_slots() the query would be:

    const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
        " restart_lsn, catalog_xmin, two_phase, failover,"
        " database, conflict_reason"
        " FROM pg_catalog.pg_replication_slots"
        " WHERE failover and NOT temporary and 1 = 1";

Then my comment is to rewrite it to:

    const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
        " restart_lsn, catalog_xmin, two_phase, failover,"
        " database, conflict_reason"
        " FROM pg_catalog.pg_replication_slots"
        " WHERE failover and NOT temporary and 1 OPERATOR(pg_catalog.=) 1";

to ensure the operator "=" is coming from the pg_catalog schema.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

22 February 2024, 11:53:59

On Thu, Feb 22, 2024 at 4:35 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Thu, Feb 22, 2024 at 04:01:34PM +0530, shveta malik wrote:
> > On Thu, Feb 22, 2024 at 3:44 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > Thanks!
> > >
> > > Some random comments about v92_001 (Sorry if it has already been discussed
> > > up-thread):
> >
> > Thanks for the feedback. The patch is pushed 15 minutes back.
>
> Yeah, saw that after I send the comments ;-)
>

There is a BF failure. See
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-22%2010%3A13%3A03.

The initial analysis suggests that for some reason, the primary went
down after the slot sync worker was invoked the first time. See the
below in the primary's LOG:

2024-02-22 10:59:56.896 UTC [2721639:29] standby1_slotsync worker LOG:
 00000: statement: SELECT slot_name, plugin, confirmed_flush_lsn,
restart_lsn, catalog_xmin, two_phase, failover, database,
conflict_reason FROM pg_catalog.pg_replication_slots WHERE failover
and NOT temporary
2024-02-22 10:59:56.896 UTC [2721639:30] standby1_slotsync worker
LOCATION:  exec_simple_query, postgres.c:1070
2024-02-22 11:00:26.967 UTC [2721639:31] standby1_slotsync worker LOG:
 00000: statement: SELECT slot_name, plugin, confirmed_flush_lsn,
restart_lsn, catalog_xmin, two_phase, failover, database,
conflict_reason FROM pg_catalog.pg_replication_slots WHERE failover
and NOT temporary
2024-02-22 11:00:26.967 UTC [2721639:32] standby1_slotsync worker
LOCATION:  exec_simple_query, postgres.c:1070
2024-02-22 11:00:35.908 UTC [2721435:309] LOG:  00000: received
immediate shutdown request
2024-02-22 11:00:35.908 UTC [2721435:310] LOCATION:
process_pm_shutdown_request, postmaster.c:2859
2024-02-22 11:00:35.911 UTC [2721435:311] LOG:  00000: database system
is shut down
2024-02-22 11:00:35.911 UTC [2721435:312] LOCATION:  UnlinkLockFiles,
miscinit.c:1138


--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

22 February 2024, 12:41:17

On Thu, Feb 22, 2024 at 5:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Feb 22, 2024 at 4:35 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Thu, Feb 22, 2024 at 04:01:34PM +0530, shveta malik wrote:
> > > On Thu, Feb 22, 2024 at 3:44 PM Bertrand Drouvot
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > Thanks!
> > > >
> > > > Some random comments about v92_001 (Sorry if it has already been discussed
> > > > up-thread):
> > >
> > > Thanks for the feedback. The patch is pushed 15 minutes back.
> >
> > Yeah, saw that after I send the comments ;-)
> >
>
> There is a BF failure. See
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-02-22%2010%3A13%3A03.
>
> The initial analysis suggests that for some reason, the primary went
> down after the slot sync worker was invoked the first time. See the
> below in the primary's LOG:
>

The reason is that the test failed waiting on below LOG:

### Reloading node "standby1"
# Running: pg_ctl -D

/home/ec2-user/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_040_standby_failover_slots_sync_standby1_data/pgdata
reload
server signaled
timed out waiting for match: (?^:LOG:  slot sync worker started) at
t/040_standby_failover_slots_sync.pl line 376.

Now, on standby, we see a LOG like 2024-02-22 10:57:35.432 UTC
[2721638:1] LOG: 00000: slot sync worker started. Even then the test
failed and the reason is that it has an extra 0000 before the actual
message which is due to log_error_verbosity = verbose in config. I
think here the test's log matching code needs to have a more robust
log line matching code.

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

22 February 2024, 12:50:36

On Thursday, February 22, 2024 8:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Thu, Feb 22, 2024 at 5:23 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Thu, Feb 22, 2024 at 4:35 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > On Thu, Feb 22, 2024 at 04:01:34PM +0530, shveta malik wrote:
> > > > On Thu, Feb 22, 2024 at 3:44 PM Bertrand Drouvot
> > > > <bertranddrouvot.pg@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Some random comments about v92_001 (Sorry if it has already been
> > > > > discussed
> > > > > up-thread):
> > > >
> > > > Thanks for the feedback. The patch is pushed 15 minutes back.
> > >
> > > Yeah, saw that after I send the comments ;-)
> > >
> >
> > There is a BF failure. See
> >
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2024-0
> 2-22%2010%3A13%3A03.
> >
> > The initial analysis suggests that for some reason, the primary went
> > down after the slot sync worker was invoked the first time. See the
> > below in the primary's LOG:
> >
> 
> The reason is that the test failed waiting on below LOG:
> 
> ### Reloading node "standby1"
> # Running: pg_ctl -D
> /home/ec2-user/bf/root/HEAD/pgsql.build/src/test/recovery/tmp_check/t_
> 040_standby_failover_slots_sync_standby1_data/pgdata
> reload
> server signaled
> timed out waiting for match: (?^:LOG:  slot sync worker started) at
> t/040_standby_failover_slots_sync.pl line 376.
> 
> Now, on standby, we see a LOG like 2024-02-22 10:57:35.432 UTC [2721638:1]
> LOG: 00000: slot sync worker started. Even then the test failed and the reason is
> that it has an extra 0000 before the actual message which is due to
> log_error_verbosity = verbose in config. I think here the test's log matching
> code needs to have a more robust log line matching code.

Agreed. Here is a small patch to change the msg in wait_for_log so that it only
search the message part.

Best Regards,
Hou zj

Attachment

0001-Make-recovery-test-pass-with-log_error_verbosity-ver.patch

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

23 February 2024, 02:01:38

Hi,

Since the slotsync worker patch has been committed, I rebased the remaining patches.
And here is the V95 patch set.

Also, I fixed a bug in the current 0001 patch where the member of the standby
slot names list pointed to the freed memory after calling ProcessConfigFile().
Now, we will obtain a new list when we call ProcessConfigFile(). The
optimization to only get the new list when the names actually change has been
removed. I think this change is acceptable because ProcessConfigFile is not a
frequent occurrence.

Additionally, I reordered the tests in 040_standby_failover_slots_sync.pl. Now the new test
will be conducted after the sync slot test to prevent the risk of the logical
slot occasionally not catching up to the latest catalog_xmin and, as a result,
not being able to be synced immediately.

Best Regards,
Hou zj

Attachment

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

23 February 2024, 02:18:02

On Friday, February 23, 2024 10:02 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> Hi,
> 
> Since the slotsync worker patch has been committed, I rebased the remaining
> patches.
> And here is the V95 patch set.
> 
> Also, I fixed a bug in the current 0001 patch where the member of the standby
> slot names list pointed to the freed memory after calling ProcessConfigFile().
> Now, we will obtain a new list when we call ProcessConfigFile(). The
> optimization to only get the new list when the names actually change has been
> removed. I think this change is acceptable because ProcessConfigFile is not a
> frequent occurrence.
> 
> Additionally, I reordered the tests in 040_standby_failover_slots_sync.pl. Now
> the new test will be conducted after the sync slot test to prevent the risk of the
> logical slot occasionally not catching up to the latest catalog_xmin and, as a
> result, not being able to be synced immediately.

There is one unexpected change in the previous version, sorry for that.
Here is the correct version.

Best Regards,
Hou zj

On Friday, February 23, 2024 10:18 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> >
> > Hi,
> >
> > Since the slotsync worker patch has been committed, I rebased the
> > remaining patches.
> > And here is the V95 patch set.
> >
> > Also, I fixed a bug in the current 0001 patch where the member of the
> > standby slot names list pointed to the freed memory after calling
> ProcessConfigFile().
> > Now, we will obtain a new list when we call ProcessConfigFile(). The
> > optimization to only get the new list when the names actually change
> > has been removed. I think this change is acceptable because
> > ProcessConfigFile is not a frequent occurrence.
> >
> > Additionally, I reordered the tests in
> > 040_standby_failover_slots_sync.pl. Now the new test will be conducted
> > after the sync slot test to prevent the risk of the logical slot
> > occasionally not catching up to the latest catalog_xmin and, as a result, not
> being able to be synced immediately.
> 
> There is one unexpected change in the previous version, sorry for that.
> Here is the correct version.

I noticed one CFbot failure[1] which is because the tap-test doesn't wait for the
standby to catch up before promoting, thus the data inserted after promotion
could not be replicated to the subscriber. Add a wait_for_replay_catchup to fix it.

Apart from this, I also adjusted some variable names in the tap-test to be
consistent. And added back a mis-removed ProcessConfigFile call.

[1] https://cirrus-ci.com/task/6126787437002752?logs=check_world#L312

Best Regards,
Hou zj

On Friday, February 23, 2024 1:22 PM shveta malik <shveta.malik@gmail.com> wrote:

> 
> Thanks for the patches. Had a quick look at v95_2, here are some
> trivial comments:

Thanks for the comments.

> 6) streaming replication standby server slot names that logical walsender
> processes will wait for
> 
> Is it better to say it like this? (I leave this to your preference)
> 
> streaming replication standby server slot names for which logical
> walsender processes will wait.

I feel the current one seems better, so didn’t change. Other comments have been
addressed. Here is the V97 patch set which addressed Shveta's comments.

Besides, I'd like to clarify and discuss the behavior of standby_slot_names once.

As it stands in the patch, If the slots specified in standby_slot_names are
dropped or invalidated, the logical walsender will issue a WARNING and continue
to replicate the changes. Another option for this could be to have the
walsender pause until the slot in standby_slot_names is re-created or becomes
valid again. Does anyone else have an opinion on this matter ?

Best Regards,
Hou zj

Hi,

On Fri, Feb 23, 2024 at 09:30:58AM +0000, Zhijie Hou (Fujitsu) wrote:
> On Friday, February 23, 2024 5:07 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> > On Fri, Feb 23, 2024 at 02:15:11PM +0530, shveta malik wrote:
> > >
> > > Thanks for the details. I understand it now.  We do not use '=' in our
> > > main slots-fetch query but we do use '=' in remote-validation query.
> > > See validate_remote_info().
> > 
> > Oh, right, I missed it during the review.
> > 
> > > Do you think instead of doing the above, we can override search-path
> > > with empty string in the slot-sync case.
> > > SImilar to logical apply worker and autovacuum worker case (see
> > > InitializeLogRepWorker(), AutoVacWorkerMain()).
> > 
> > Yeah, we should definitively ensure that any operators being used in the query
> > is coming from the pg_catalog schema (could be by setting the search path or
> > using the up-thread proposal).
> > 
> > Setting the search path would prevent any risks in case the query is changed
> > later on, so I'd vote for changing the search path in validate_remote_info() and
> > in synchronize_slots() to be on the safe side.
> 
> I think to set secure search path for remote connection, the standard approach
> could be to extend the code in libpqrcv_connect[1], so that we don't need to schema
> qualify all the operators in the queries.
> 
> And for local connection, I agree it's also needed to add a
> SetConfigOption("search_path", "" call in the slotsync worker.
> 
> [1]
> libpqrcv_connect
> ...
>     if (logical)
> ...
>         res = libpqrcv_PQexec(conn->streamConn,
>                               ALWAYS_SECURE_SEARCH_PATH_SQL);
> 

Agree, something like in the attached? (it's .txt to not disturb the CF bot).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v1-0001-reset-search_path-for-slot-synchronization.txt

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

26 February 2024, 02:18:58

On Friday, February 23, 2024 6:12 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
>
> Here are some random comments:

Thanks for the comments!

>
> 1 ===
>
> Commit message "Allow logical walsenders to wait for the physical"
>
> s/physical/physical standby/?
>
> 2 ==
>
> +++ b/src/backend/replication/logical/logicalfuncs.c
> @@ -30,6 +30,7 @@
>  #include "replication/decode.h"
>  #include "replication/logical.h"
>  #include "replication/message.h"
> +#include "replication/walsender.h"
>
> Is this include needed?

Removed.

>
> 3 ===
>
> +        * Slot sync is currently not supported on the cascading
> + standby. This is
>
> s/on the/on a/?

Changed.

>
> 4 ===
>
> +       if (!ok)
> +               GUC_check_errdetail("List syntax is invalid.");
> +
> +       /*
> +        * If there is a syntax error in the name or if the replication slots'
> +        * data is not initialized yet (i.e., we are in the startup process), skip
> +        * the slot verification.
> +        */
> +       if (!ok || !ReplicationSlotCtl)
> +       {
> +               pfree(rawname);
> +               list_free(elemlist);
> +               return ok;
> +       }
>
> we are testing the "ok" value twice, what about using if...else if... instead and
> test it once? If so, it might be worth to put the:
>
> "
> +       pfree(rawname);
> +       list_free(elemlist);
> +       return ok;
> "
>
> in a "goto".

There were comments to remove the 'goto' statement and avoid
duplicate free code, so I prefer the current style.

>
> 5 ===
>
> +        * for which all standbys to wait for. Even if we have
> + physical-slots
>
> s/physical-slots/physical slots/?

Changed.

>
> 6 ===
>
>         * Switch to the same memory context under which GUC variables are
>
> s/to the same memory/to the memory/?

Changed.

>
> 7 ===
>
> + * Return a copy of standby_slot_names_list if the copy flag is set to
> + true,
>
> Not sure, but would it be worth explaining why one would want to set to flag to
> true or false? (i.e why one would not want to receive the original list).

I think the usage can be found from the caller's code, e.g we need to remove
the slots that caught up from the list each time, so we cannot directly modify
the global list. The GetStandbySlotList function is general function and I feel
we can avoid adding more comments here.

>
> 8 ===
>
> +       if (RecoveryInProgress())
> +               return NIL;
>
> The need is well documented just above, but are we not violating the fact that
> we return the original list or a copy of it? (that's what the comment above the
> GetStandbySlotList() function definition is saying).
>
> I think the comment above the GetStandbySlotList() function needs a bit of
> rewording to cover that case.

Adjusted.


>
> 9 ===
>
> +                        * harmless, a WARNING should be enough, no need to
> error-out.
>
> s/error-out/error out/?

Changed.

>
> 10 ===
>
> +                       if (slot->data.invalidated != RS_INVAL_NONE)
> +                       {
> +                               /*
> +                                * Specified physical slot have been invalidated,
> so no point
> +                                * in waiting for it.
>
> We discovered in [1], that if the wal_status is "unreserved" then the slot is still
> serving the standby. I think we should handle this case differently, thoughts?

I think the 'invalidated' slot can still be used is a separate bug. Because
once the slot is invalidated, it can neither protect WALs or ROWs from being
removed even if the restart_lsn of the slot can be moved forward after being invalidated.

If the standby can move restart_lsn forward for invalidated slots, then
it should also set the 'invalidated' flag back to NONE, otherwise the slot
cannot serve its purpose anymore. I also reported similar bug before[1].

>
> 11 ===
>
> +                                * Specified physical slot have been
> + invalidated, so no point
>
> s/have been/has been/?

Changed.

>
> 12 ===
>
> +++ b/src/backend/replication/slotfuncs.c
> @@ -22,6 +22,7 @@
>  #include "replication/logical.h"
>  #include "replication/slot.h"
>  #include "replication/slotsync.h"
> +#include "replication/walsender.h"
>
> Is this include needed?

No, it's not needed. Removed.

Attach the V98 patch set which addressed above comments.
I also adjusted few comments based on off-list comments from Shveta.

The discussion for wait behavior is on-going, so I didn't change the behavior in this version.

[1]
https://www.postgresql.org/message-id/flat/OS0PR01MB5716A626A4AF5814E057CEE39484A@OS0PR01MB5716.jpnprd01.prod.outlook.com

Best Regards,
Hou zj

Attachment

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

26 February 2024, 03:43:05

On Fri, Feb 23, 2024 at 7:41 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
> > I think to set secure search path for remote connection, the standard approach
> > could be to extend the code in libpqrcv_connect[1], so that we don't need to schema
> > qualify all the operators in the queries.
> >
> > And for local connection, I agree it's also needed to add a
> > SetConfigOption("search_path", "" call in the slotsync worker.
> >
> > [1]
> > libpqrcv_connect
> > ...
> >       if (logical)
> > ...
> >               res = libpqrcv_PQexec(conn->streamConn,
> >                                                         ALWAYS_SECURE_SEARCH_PATH_SQL);
> >
>
> Agree, something like in the attached? (it's .txt to not disturb the CF bot).

Thanks for the patch, changes look good. I have corporated it in the
patch which addresses the rest of your comments in [1]. I have
attached the patch as .txt

[1]: https://www.postgresql.org/message-id/ZdcejBDCr%2BwlVGnO%40ip-10-97-1-34.eu-west-3.compute.internal

thanks
Shveta

Attachment

v1-0001-Top-up-Patch-for-commit-93db6cbda0.patch.txt

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

26 February 2024, 05:18:38

On Fri, Feb 23, 2024 at 4:45 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Fri, Feb 23, 2024 at 09:46:00AM +0000, Zhijie Hou (Fujitsu) wrote:
> >
> > Besides, I'd like to clarify and discuss the behavior of standby_slot_names once.
> >
> > As it stands in the patch, If the slots specified in standby_slot_names are
> > dropped or invalidated, the logical walsender will issue a WARNING and continue
> > to replicate the changes. Another option for this could be to have the
> > walsender pause until the slot in standby_slot_names is re-created or becomes
> > valid again. Does anyone else have an opinion on this matter ?
>
> Good point, I'd vote for: the only reasons not to wait are:
>
> - slots mentioned in standby_slot_names exist and valid and do catch up
> or
> - standby_slot_names is empty
>
> The reason is that setting standby_slot_names to a non empty value means that
> one wants the walsender to wait until the standby catchup. The way to remove this
> intentional behavior should be by changing the standby_slot_names value (not the
> existence or the state of the slot(s) it points too).
>

It seems we already do wait for the case when there is an inactive
slot as per the below code [1] in the patch. So, probably waiting in
other cases is also okay and also as this parameter is a SIGHUP
parameter, users should be easily able to change its value if
required. Do you think it is a good idea to mention this in docs as
well?

I think it is important to raise WARNING as the patch is doing in all
the cases where the slot is not being processed so that users can be
notified and they can take the required action.

[1] -
else if (XLogRecPtrIsInvalid(slot->data.restart_lsn) ||
+ slot->data.restart_lsn < wait_for_lsn)
+ {
+ bool inactive = (slot->active_pid == 0);
+
+ SpinLockRelease(&slot->mutex);
+
+ /* Log warning if no active_pid for this physical slot */
+ if (inactive)
+ ereport(WARNING,
+ errmsg("replication slot \"%s\" specified in parameter %s does not
have active_pid",
+    name, "standby_slot_names"),
+ errdetail("Logical replication is waiting on the standby associated
with \"%s\".",
+   name),
+ errhint("Consider starting standby associated with \"%s\" or amend
standby_slot_names.",
+ name));
+
+ /* Continue if the current slot hasn't caught up. */
+ continue;

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

26 February 2024, 07:29:07

Hi,

On Mon, Feb 26, 2024 at 02:18:58AM +0000, Zhijie Hou (Fujitsu) wrote:
> On Friday, February 23, 2024 6:12 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> > +       if (!ok)
> > +               GUC_check_errdetail("List syntax is invalid.");
> > +
> > +       /*
> > +        * If there is a syntax error in the name or if the replication slots'
> > +        * data is not initialized yet (i.e., we are in the startup process), skip
> > +        * the slot verification.
> > +        */
> > +       if (!ok || !ReplicationSlotCtl)
> > +       {
> > +               pfree(rawname);
> > +               list_free(elemlist);
> > +               return ok;
> > +       }
> > 
> > we are testing the "ok" value twice, what about using if...else if... instead and
> > test it once? If so, it might be worth to put the:
> > 
> > "
> > +       pfree(rawname);
> > +       list_free(elemlist);
> > +       return ok;
> > "
> > 
> > in a "goto".
> 
> There were comments to remove the 'goto' statement and avoid
> duplicate free code, so I prefer the current style.

The duplicate free code would come from the if...else if... rewrite but then
the "goto" would remove it, so I'm not sure to understand your point.

> > 
> > 7 ===
> > 
> > + * Return a copy of standby_slot_names_list if the copy flag is set to
> > + true,
> > 
> > Not sure, but would it be worth explaining why one would want to set to flag to
> > true or false? (i.e why one would not want to receive the original list).
> 
> I think the usage can be found from the caller's code, e.g we need to remove
> the slots that caught up from the list each time, so we cannot directly modify
> the global list. The GetStandbySlotList function is general function and I feel
> we can avoid adding more comments here.

Okay, yeah makes sense.

> > 
> > 10 ===
> > 
> > +                       if (slot->data.invalidated != RS_INVAL_NONE)
> > +                       {
> > +                               /*
> > +                                * Specified physical slot have been invalidated,
> > so no point
> > +                                * in waiting for it.
> > 
> > We discovered in [1], that if the wal_status is "unreserved" then the slot is still
> > serving the standby. I think we should handle this case differently, thoughts?
> 
> I think the 'invalidated' slot can still be used is a separate bug.
> Because
> once the slot is invalidated, it can neither protect WALs or ROWs from being
> removed even if the restart_lsn of the slot can be moved forward after being invalidated.
> 
> If the standby can move restart_lsn forward for invalidated slots, then
> it should also set the 'invalidated' flag back to NONE, otherwise the slot
> cannot serve its purpose anymore. I also reported similar bug before[1].

I see. But should'nt we add a check on restart_lsn as this is done here in 
pg_get_replication_slots()?

"
case WALAVAIL_REMOVED:


/*
* If we read the restart_lsn long enough ago, maybe that file
* has been removed by now. However, the walsender could have
* moved forward enough that it jumped to another file after
* we looked. If checkpointer signalled the process to
* termination, then it's definitely lost; but if a process is
* still alive, then "unreserved" seems more appropriate.

if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn))

"

My point is that I think we should behave like it's not a bug and then adapt the
code accordingly here (until the bug gets fixed).

Currently we are not waiting for this slot while it's still serving the standby
which does not seem good too, thoughts?

> Attach the V98 patch set which addressed above comments.
> I also adjusted few comments based on off-list comments from Shveta.

Thanks!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

26 February 2024, 07:34:41

Hi,

On Mon, Feb 26, 2024 at 09:13:05AM +0530, shveta malik wrote:
> On Fri, Feb 23, 2024 at 7:41 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> > > I think to set secure search path for remote connection, the standard approach
> > > could be to extend the code in libpqrcv_connect[1], so that we don't need to schema
> > > qualify all the operators in the queries.
> > >
> > > And for local connection, I agree it's also needed to add a
> > > SetConfigOption("search_path", "" call in the slotsync worker.
> > >
> > > [1]
> > > libpqrcv_connect
> > > ...
> > >       if (logical)
> > > ...
> > >               res = libpqrcv_PQexec(conn->streamConn,
> > >                                                         ALWAYS_SECURE_SEARCH_PATH_SQL);
> > >
> >
> > Agree, something like in the attached? (it's .txt to not disturb the CF bot).
> 
> Thanks for the patch, changes look good. I have corporated it in the
> patch which addresses the rest of your comments in [1]. I have
> attached the patch as .txt

Thanks!

LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

26 February 2024, 07:48:38

Hi,

On Mon, Feb 26, 2024 at 10:48:38AM +0530, Amit Kapila wrote:
> On Fri, Feb 23, 2024 at 4:45 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Fri, Feb 23, 2024 at 09:46:00AM +0000, Zhijie Hou (Fujitsu) wrote:
> > >
> > > Besides, I'd like to clarify and discuss the behavior of standby_slot_names once.
> > >
> > > As it stands in the patch, If the slots specified in standby_slot_names are
> > > dropped or invalidated, the logical walsender will issue a WARNING and continue
> > > to replicate the changes. Another option for this could be to have the
> > > walsender pause until the slot in standby_slot_names is re-created or becomes
> > > valid again. Does anyone else have an opinion on this matter ?
> >
> > Good point, I'd vote for: the only reasons not to wait are:
> >
> > - slots mentioned in standby_slot_names exist and valid and do catch up
> > or
> > - standby_slot_names is empty
> >
> > The reason is that setting standby_slot_names to a non empty value means that
> > one wants the walsender to wait until the standby catchup. The way to remove this
> > intentional behavior should be by changing the standby_slot_names value (not the
> > existence or the state of the slot(s) it points too).
> >
> 
> It seems we already do wait for the case when there is an inactive
> slot as per the below code [1] in the patch. So, probably waiting in
> other cases is also okay and also as this parameter is a SIGHUP
> parameter, users should be easily able to change its value if
> required.

Agree.

> Do you think it is a good idea to mention this in docs as
> well?

Yeah, I think the more the better.

> I think it is important to raise WARNING as the patch is doing in all
> the cases where the slot is not being processed so that users can be
> notified and they can take the required action.

+1

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

26 February 2024, 11:48:25

On Mon, Feb 26, 2024 at 12:59 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Mon, Feb 26, 2024 at 02:18:58AM +0000, Zhijie Hou (Fujitsu) wrote:
> > On Friday, February 23, 2024 6:12 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> > > +       if (!ok)
> > > +               GUC_check_errdetail("List syntax is invalid.");
> > > +
> > > +       /*
> > > +        * If there is a syntax error in the name or if the replication slots'
> > > +        * data is not initialized yet (i.e., we are in the startup process), skip
> > > +        * the slot verification.
> > > +        */
> > > +       if (!ok || !ReplicationSlotCtl)
> > > +       {
> > > +               pfree(rawname);
> > > +               list_free(elemlist);
> > > +               return ok;
> > > +       }
> > >
> > > we are testing the "ok" value twice, what about using if...else if... instead and
> > > test it once? If so, it might be worth to put the:
> > >
> > > "
> > > +       pfree(rawname);
> > > +       list_free(elemlist);
> > > +       return ok;
> > > "
> > >
> > > in a "goto".
> >
> > There were comments to remove the 'goto' statement and avoid
> > duplicate free code, so I prefer the current style.
>
> The duplicate free code would come from the if...else if... rewrite but then
> the "goto" would remove it, so I'm not sure to understand your point.
>

I think Hou-San wants to say that there was previously a comment to
remove goto and now you are saying to introduce it. But, I think we
can avoid both code duplication and goto, if the first thing we check
in the function is ReplicationSlotCtl and return false if the same is
not set. Won't that be better?

>
> > >
> > > 10 ===
> > >
> > > +                       if (slot->data.invalidated != RS_INVAL_NONE)
> > > +                       {
> > > +                               /*
> > > +                                * Specified physical slot have been invalidated,
> > > so no point
> > > +                                * in waiting for it.
> > >
> > > We discovered in [1], that if the wal_status is "unreserved" then the slot is still
> > > serving the standby. I think we should handle this case differently, thoughts?
> >
> > I think the 'invalidated' slot can still be used is a separate bug.
> > Because
> > once the slot is invalidated, it can neither protect WALs or ROWs from being
> > removed even if the restart_lsn of the slot can be moved forward after being invalidated.
> >
> > If the standby can move restart_lsn forward for invalidated slots, then
> > it should also set the 'invalidated' flag back to NONE, otherwise the slot
> > cannot serve its purpose anymore. I also reported similar bug before[1].
>
...
>
> My point is that I think we should behave like it's not a bug and then adapt the
> code accordingly here (until the bug gets fixed).
>

oh, I think this doesn't sound like a good idea to me. We should fix
that bug independently rather than adding code in new features to
consider the bug as a valid behavior. It will add the burden on us to
remember and remove the additional new check(s).

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

26 February 2024, 11:52:20

On Mon, Feb 26, 2024 at 7:49 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Attach the V98 patch set which addressed above comments.
>

Few comments:
=============
1.
 WalSndWaitForWal(XLogRecPtr loc)
 {
  int wakeEvents;
+ bool wait_for_standby = false;
+ uint32 wait_event;
+ List    *standby_slots = NIL;
  static XLogRecPtr RecentFlushPtr = InvalidXLogRecPtr;

+ if (MyReplicationSlot->data.failover && replication_active)
+ standby_slots = GetStandbySlotList(true);
+
  /*
- * Fast path to avoid acquiring the spinlock in case we already know we
- * have enough WAL available. This is particularly interesting if we're
- * far behind.
+ * Check if all the standby servers have confirmed receipt of WAL up to
+ * RecentFlushPtr even when we already know we have enough WAL available.
+ *
+ * Note that we cannot directly return without checking the status of
+ * standby servers because the standby_slot_names may have changed, which
+ * means there could be new standby slots in the list that have not yet
+ * caught up to the RecentFlushPtr.
  */
- if (RecentFlushPtr != InvalidXLogRecPtr &&
- loc <= RecentFlushPtr)
- return RecentFlushPtr;
+ if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr)
+ {
+ FilterStandbySlots(RecentFlushPtr, &standby_slots);

I think even if the slot list is not changed, we will always process
each slot mentioned in standby_slot_names once. Can't we cache the
previous list of slots for we have already waited for? In that case,
we won't even need to copy the list via GetStandbySlotList() unless we
need to wait.

2.
 WalSndWaitForWal(XLogRecPtr loc)
 {
+ /*
+ * Update the standby slots that have not yet caught up to the flushed
+ * position. It is good to wait up to RecentFlushPtr and then let it
+ * send the changes to logical subscribers one by one which are
+ * already covered in RecentFlushPtr without needing to wait on every
+ * change for standby confirmation.
+ */
+ if (wait_for_standby)
+ FilterStandbySlots(RecentFlushPtr, &standby_slots);
+
  /* Update our idea of the currently flushed position. */
- if (!RecoveryInProgress())
+ else if (!RecoveryInProgress())
  RecentFlushPtr = GetFlushRecPtr(NULL);
  else
  RecentFlushPtr = GetXLogReplayRecPtr(NULL);
...
/*
* If postmaster asked us to stop, don't wait anymore.
*
* It's important to do this check after the recomputation of
* RecentFlushPtr, so we can send all remaining data before shutting
* down.
*/
if (got_STOPPING)
break;

I think because 'wait_for_standby' may not be set in the first or
consecutive cycles we may send the WAL to the logical subscriber
before sending it to the physical subscriber during shutdown.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

26 February 2024, 12:22:40

On Mon, Feb 26, 2024 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > > > +       if (!ok)
> > > > +               GUC_check_errdetail("List syntax is invalid.");
> > > > +
> > > > +       /*
> > > > +        * If there is a syntax error in the name or if the replication slots'
> > > > +        * data is not initialized yet (i.e., we are in the startup process), skip
> > > > +        * the slot verification.
> > > > +        */
> > > > +       if (!ok || !ReplicationSlotCtl)
> > > > +       {
> > > > +               pfree(rawname);
> > > > +               list_free(elemlist);
> > > > +               return ok;
> > > > +       }
> > > >
> > > > we are testing the "ok" value twice, what about using if...else if... instead and
> > > > test it once? If so, it might be worth to put the:
> > > >
> > > > "
> > > > +       pfree(rawname);
> > > > +       list_free(elemlist);
> > > > +       return ok;
> > > > "
> > > >
> > > > in a "goto".
> > >
> > > There were comments to remove the 'goto' statement and avoid
> > > duplicate free code, so I prefer the current style.
> >
> > The duplicate free code would come from the if...else if... rewrite but then
> > the "goto" would remove it, so I'm not sure to understand your point.
> >
>
> I think Hou-San wants to say that there was previously a comment to
> remove goto and now you are saying to introduce it. But, I think we
> can avoid both code duplication and goto, if the first thing we check
> in the function is ReplicationSlotCtl and return false if the same is
> not set. Won't that be better?

I think we can not do that as we need to check atleast syntax before
we return due to NULL ReplicationSlotCtl. We get NULL
ReplicationSlotCtl during instance startup in
check_standby_slot_names() as postmaster first loads GUC-table and
then initializes shared-memory for replication slots. See calls of
InitializeGUCOptions() and CreateSharedMemoryAndSemaphores() in
PostmasterMain().  FWIW, I do not have any issue with current code as
well, but if we have to change it, is [1] any better?

[1]:
check_standby_slot_names()
{
....
if (!ok)
{
    GUC_check_errdetail("List syntax is invalid.");
}
else if (ReplicationSlotCtl)
{
   foreach-loop for slot validation
}

pfree(rawname);
list_free(elemlist);
return ok;
}

thanks
SHveta

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

26 February 2024, 13:13:56

Hi,

On Mon, Feb 26, 2024 at 05:18:25PM +0530, Amit Kapila wrote:
> On Mon, Feb 26, 2024 at 12:59 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > > > 10 ===
> > > >
> > > > +                       if (slot->data.invalidated != RS_INVAL_NONE)
> > > > +                       {
> > > > +                               /*
> > > > +                                * Specified physical slot have been invalidated,
> > > > so no point
> > > > +                                * in waiting for it.
> > > >
> > > > We discovered in [1], that if the wal_status is "unreserved" then the slot is still
> > > > serving the standby. I think we should handle this case differently, thoughts?
> > >
> > > I think the 'invalidated' slot can still be used is a separate bug.
> > > Because
> > > once the slot is invalidated, it can neither protect WALs or ROWs from being
> > > removed even if the restart_lsn of the slot can be moved forward after being invalidated.
> > >
> > > If the standby can move restart_lsn forward for invalidated slots, then
> > > it should also set the 'invalidated' flag back to NONE, otherwise the slot
> > > cannot serve its purpose anymore. I also reported similar bug before[1].
> >
> ...
> >
> > My point is that I think we should behave like it's not a bug and then adapt the
> > code accordingly here (until the bug gets fixed).
> >
> 
> oh, I think this doesn't sound like a good idea to me. We should fix
> that bug independently rather than adding code in new features to
> consider the bug as a valid behavior.

Agree, but it all depends if there is a consensus of the other thread being a
bug or not.

I also think it is but there is this part of the code in pg_get_replication_slots()
that makes me think ones could think it is not.

"
            case WALAVAIL_REMOVED:

                /*
                 * If we read the restart_lsn long enough ago, maybe that file
                 * has been removed by now.  However, the walsender could have
                 * moved forward enough that it jumped to another file after
                 * we looked.  If checkpointer signalled the process to
                 * termination, then it's definitely lost; but if a process is
                 * still alive, then "unreserved" seems more appropriate.
                 *
"

Anyway, I also think it is a bug so agree to keep the check as it is currenlty (
and keep an eye on the other thread outcome too).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

26 February 2024, 13:24:37

Hi,

On Mon, Feb 26, 2024 at 05:52:40PM +0530, shveta malik wrote:
> On Mon, Feb 26, 2024 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > > > +       if (!ok)
> > > > > +               GUC_check_errdetail("List syntax is invalid.");
> > > > > +
> > > > > +       /*
> > > > > +        * If there is a syntax error in the name or if the replication slots'
> > > > > +        * data is not initialized yet (i.e., we are in the startup process), skip
> > > > > +        * the slot verification.
> > > > > +        */
> > > > > +       if (!ok || !ReplicationSlotCtl)
> > > > > +       {
> > > > > +               pfree(rawname);
> > > > > +               list_free(elemlist);
> > > > > +               return ok;
> > > > > +       }
> > > > >
> > > > > we are testing the "ok" value twice, what about using if...else if... instead and
> > > > > test it once? If so, it might be worth to put the:
> > > > >
> > > > > "
> > > > > +       pfree(rawname);
> > > > > +       list_free(elemlist);
> > > > > +       return ok;
> > > > > "
> > > > >
> > > > > in a "goto".
> > > >
> > > > There were comments to remove the 'goto' statement and avoid
> > > > duplicate free code, so I prefer the current style.
> > >
> > > The duplicate free code would come from the if...else if... rewrite but then
> > > the "goto" would remove it, so I'm not sure to understand your point.
> > >
> >
> > I think Hou-San wants to say that there was previously a comment to
> > remove goto and now you are saying to introduce it. But, I think we
> > can avoid both code duplication and goto, if the first thing we check
> > in the function is ReplicationSlotCtl and return false if the same is
> > not set. Won't that be better?
> 
> I think we can not do that as we need to check atleast syntax before
> we return due to NULL ReplicationSlotCtl. We get NULL
> ReplicationSlotCtl during instance startup in
> check_standby_slot_names() as postmaster first loads GUC-table and
> then initializes shared-memory for replication slots. See calls of
> InitializeGUCOptions() and CreateSharedMemoryAndSemaphores() in
> PostmasterMain().  FWIW, I do not have any issue with current code as
> well, but if we have to change it, is [1] any better?
> 
> [1]:
> check_standby_slot_names()
> {
> ....
> if (!ok)
> {
>     GUC_check_errdetail("List syntax is invalid.");
> }
> else if (ReplicationSlotCtl)
> {
>    foreach-loop for slot validation
> }
> 
> pfree(rawname);
> list_free(elemlist);
> return ok;
> }
> 

Yeah thanks, it does not test the "ok" value twice and get rid of the goto
while checking the syntax first: I'd vote for it.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

27 February 2024, 01:42:59

On Monday, February 26, 2024 1:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Fri, Feb 23, 2024 at 4:45 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Fri, Feb 23, 2024 at 09:46:00AM +0000, Zhijie Hou (Fujitsu) wrote:
> > >
> > > Besides, I'd like to clarify and discuss the behavior of standby_slot_names
> once.
> > >
> > > As it stands in the patch, If the slots specified in
> > > standby_slot_names are dropped or invalidated, the logical walsender
> > > will issue a WARNING and continue to replicate the changes. Another
> > > option for this could be to have the walsender pause until the slot
> > > in standby_slot_names is re-created or becomes valid again. Does anyone
> else have an opinion on this matter ?
> >
> > Good point, I'd vote for: the only reasons not to wait are:
> >
> > - slots mentioned in standby_slot_names exist and valid and do catch
> > up or
> > - standby_slot_names is empty
> >
> > The reason is that setting standby_slot_names to a non empty value
> > means that one wants the walsender to wait until the standby catchup.
> > The way to remove this intentional behavior should be by changing the
> > standby_slot_names value (not the existence or the state of the slot(s) it
> points too).
> >
> 
> It seems we already do wait for the case when there is an inactive slot as per the
> below code [1] in the patch. So, probably waiting in other cases is also okay and
> also as this parameter is a SIGHUP parameter, users should be easily able to
> change its value if required. Do you think it is a good idea to mention this in
> docs as well?
> 
> I think it is important to raise WARNING as the patch is doing in all the cases
> where the slot is not being processed so that users can be notified and they can
> take the required action.

Agreed. Here is the V99 patch which addressed the above.

This version also includes:
1. list_free the slot list when reloading the list due to GUC change.
2. Refactored the validate_standby_slots based on Shveta's suggestion.
3. Added errcode for the warnings as most of existing have errcodes.

Amit's latest comments[1] are pending, we will address that in next version.

[1] https://www.postgresql.org/message-id/CAA4eK1LJdmGATWG%3DxOD1CB9cogukk2cLNBGH8h-n-ZDJuwBdJg%40mail.gmail.com

Best Regards,
Hou zj

On Tuesday, February 27, 2024 3:18 PM Peter Smith <smithpb2250@gmail.com> wrote:

> 
> Here are some review comments for v99-0001

Thanks for the comments!

> Commit Message
> 
> 1.
> A new parameter named standby_slot_names is introduced.
> 
> Maybe quote the GUC names here to make it more readable.

Added.

> 
> ~~
> 
> 2.
> Additionally, The SQL functions pg_logical_slot_get_changes and
> pg_replication_slot_advance are modified to wait for the replication slots
> mentioned in standby_slot_names to catch up before returning the changes to
> the user.
> 
> ~
> 
> 2a.
> "pg_replication_slot_advance" is a typo? Did you mean
> pg_logical_replication_slot_advance?

pg_logical_replication_slot_advance is not a user visible function. So the
pg_replication_slot_advance is correct.

> 
> ~
> 
> 2b.
> The "before returning the changes to the user" seems like it is referring only to
> the first function.
> 
> Maybe needs slight rewording like:
> /before returning the changes to the user./ before returning./

Changed.

> 
> ==========
> doc/src/sgml/config.sgml
> 
> 3. standby_slot_names
> 
> +       <para>
> +        List of physical slots guarantees that logical replication slots with
> +        failover enabled do not consume changes until those changes
> are received
> +        and flushed to corresponding physical standbys. If a logical
> replication
> +        connection is meant to switch to a physical standby after the
> standby is
> +        promoted, the physical replication slot for the standby
> should be listed
> +        here. Note that logical replication will not proceed if the slots
> +        specified in the standby_slot_names do not exist or are invalidated.
> +       </para>
> 
> The wording doesn't seem right. IMO this should be worded much like how this
> GUC is described in guc_tables.c
> 
> e.g something a bit like:
> 
> Lists the streaming replication standby server slot names that logical WAL
> sender processes will wait for. Logical WAL sender processes will send
> decoded changes to plugins only after the specified replication slots confirm
> receiving WAL. This guarantees that logical replication slots with failover
> enabled do not consume changes until those changes are received and flushed
> to corresponding physical standbys...

Changed.

> 
> ==========
> doc/src/sgml/logicaldecoding.sgml
> 
> 4. Section 48.2.3 Replication Slot Synchronization
> 
> +     It's also highly recommended that the said physical replication slot
> +     is named in
> +     <link
> linkend="guc-standby-slot-names"><varname>standby_slot_names</varna
> me></link>
> +     list on the primary, to prevent the subscriber from consuming changes
> +     faster than the hot standby. But once we configure it, then
> certain latency
> +     is expected in sending changes to logical subscribers due to wait on
> +     physical replication slots in
> +     <link
> +
> linkend="guc-standby-slot-names"><varname>standby_slot_names</varna
> me>
> + </link>
> 
> 4a.
> /It's also highly/It is highly/
> 
> ~
> 
> 4b.
> 
> BEFORE
> But once we configure it, then certain latency is expected in sending changes
> to logical subscribers due to wait on physical replication slots in <link
> linkend="guc-standby-slot-names"><varname>standby_slot_names</varna
> me></link>
> 
> SUGGESTION
> Even when correctly configured, some latency is expected when sending
> changes to logical subscribers due to the waiting on slots named in
> standby_slot_names.

Changed.

> 
> ==========
> .../replication/logical/logicalfuncs.c
> 
> 5. pg_logical_slot_get_changes_guts
> 
> + if (XLogRecPtrIsInvalid(upto_lsn))
> + wait_for_wal_lsn = end_of_wal;
> + else
> + wait_for_wal_lsn = Min(upto_lsn, end_of_wal);
> +
> + /*
> + * Wait for specified streaming replication standby servers (if any)
> + * to confirm receipt of WAL up to wait_for_wal_lsn.
> + */
> + WaitForStandbyConfirmation(wait_for_wal_lsn);
> 
> Perhaps those statements all belong together with the comment up-front. e.g.
> 
> + /*
> + * Wait for specified streaming replication standby servers (if any)
> + * to confirm receipt of WAL up to wait_for_wal_lsn.
> + */
> + if (XLogRecPtrIsInvalid(upto_lsn))
> + wait_for_wal_lsn = end_of_wal;
> + else
> + wait_for_wal_lsn = Min(upto_lsn, end_of_wal);
> + WaitForStandbyConfirmation(wait_for_wal_lsn);

Changed.

> 
> ==========
> src/backend/replication/logical/slotsync.c
> 
> ==========
> src/backend/replication/slot.c
> 
> 6.
> +static bool
> +validate_standby_slots(char **newval)
> +{
> + char    *rawname;
> + List    *elemlist;
> + ListCell   *lc;
> + bool ok;
> +
> + /* Need a modifiable copy of string */ rawname = pstrdup(*newval);
> +
> + /* Verify syntax and parse string into a list of identifiers */ ok =
> + SplitIdentifierString(rawname, ',', &elemlist);
> +
> + if (!ok)
> + {
> + GUC_check_errdetail("List syntax is invalid."); }
> +
> + /*
> + * If the replication slots' data have been initialized, verify if the
> + * specified slots exist and are logical slots.
> + */
> + else if (ReplicationSlotCtl)
> + {
> + foreach(lc, elemlist)
> 
> 6a.
> So, if the ReplicationSlotCtl is NULL, it is possible to return ok=true without
> ever checking if the slots exist or are of the correct kind. I am wondering what
> are the ramifications of that. -- e.g.
> assuming names are OK when maybe they aren't OK at all. AFAICT this works
> because it relies on getting subsequent WARNINGS when calling
> FilterStandbySlots(). If that is correct then maybe the comment here can be
> enhanced to say so.
> 
> Indeed, if it works like that, now I am wondering do we need this for loop
> validation at all. e.g. it seems just a matter of timing whether we get ERRORs
> validating the GUC here, or WARNINGS later in the FilterStandbySlots. Maybe
> we don't need the double-checking and it is enough to check in
> FilterStandbySlots?

I think the check is OK so didn’t change this.

> 
> ~
> 
> 6b.
> AFAIK there are alternative foreach macros available now, so you shouldn't
> need to declare the ListCell.

Changed.

> 
> ~~~
> 
> 7. check_standby_slot_names
> 
> +bool
> +check_standby_slot_names(char **newval, void **extra, GucSource source)
> +{  if (strcmp(*newval, "") == 0)  return true;
> 
> Using strcmp seems like an overkill way to check for empty string.
> 
> SUGGESTION
> 
> if (*newval == '\0')
>   return true;
> 

Changed.

> ~~~
> 
> 8.
> + if (strcmp(*newval, "*") == 0)
> + {
> + GUC_check_errdetail("\"%s\" is not accepted for standby_slot_names",
> + *newval); return false; }
> 
> It seems overkill to use a format specifier when "*" is already the known value.
> 
> SUGGESTION
> GUC_check_errdetail("Wildcard \"*\" is not accepted for
> standby_slot_names.");
> 

Changed.

> ~~~
> 
> 9.
> + /* Now verify if the specified slots really exist and have correct
> + type */ if (!validate_standby_slots(newval)) return false;
> 
> As in a prior comment, if ReplicationSlotCtl is NULL then it is not always going
> to do exactly what that comment says it is doing...

I think the comment is OK, one can check the detail in the
function definition if needed.

> 
> ~~~
> 
> 10. assign_standby_slot_names
> 
> + if (!SplitIdentifierString(standby_slot_names_cpy, ',',
> + &standby_slots)) {
> + /* This should not happen if GUC checked check_standby_slot_names. */
> + elog(ERROR, "invalid list syntax"); }
> 
> I didn't see how it is possible to get here without having first executed
> check_standby_slot_names. But, if it can happen, then maybe describe the
> scenario in the comment.

This is sanity check which we don't expect to happen, which follows similar style of preprocessNamespacePath.

> 
> ~~~
> 
> 11.
> + * Note that since we do not support syncing slots to cascading
> + standbys, we
> + * return NIL if we are running in a standby to indicate that no
> + standby slots
> + * need to be waited for, regardless of the copy flag value.
> 
> I didn't understand the relevance of even mentioning "regardless of the copy
> flag value".

Removed.

> 
> ~~~
> 
> 12. FilterStandbySlots
> 
> + errhint("Consider starting standby associated with \"%s\" or amend
> standby_slot_names.",
> + name));
> 
> This errhint should use a format substitution for the GUC "standby_slot_names"
> for consistency with everything else.

Changed.

> 
> ~~~
> 
> 13. WaitForStandbyConfirmation
> 
> + /*
> + * We wait for the slots in the standby_slot_names to catch up, but we
> + * use a timeout (1s) so we can also check the if the
> + * standby_slot_names has been changed.
> + */
> + ConditionVariableTimedSleep(&WalSndCtl->wal_confirm_rcv_cv, 1000,
> + WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION);
> 
> Typo "the if the"
> 

Changed.

> ==========
> src/backend/replication/slotfuncs.c
> 
> 14. pg_physical_replication_slot_advance
> +
> + PhysicalWakeupLogicalWalSnd();
> 
> Should this have a comment to say what it is for?
> 

Added.

> ==========
> src/backend/replication/walsender.c
> 
> 15.
> +/*
> + * Wake up the logical walsender processes with failover enabled slots
> +if the
> + * currently acquired physical slot is specified in standby_slot_names GUC.
> + */
> +void
> +PhysicalWakeupLogicalWalSnd(void)
> +{
> + ListCell   *lc;
> + List    *standby_slots;
> +
> + Assert(MyReplicationSlot && SlotIsPhysical(MyReplicationSlot));
> +
> + standby_slots = GetStandbySlotList(false);
> +
> + foreach(lc, standby_slots)
> + {
> + char    *name = lfirst(lc);
> +
> + if (strcmp(name, NameStr(MyReplicationSlot->data.name)) == 0)  {
> +ConditionVariableBroadcast(&WalSndCtl->wal_confirm_rcv_cv);
> + return;
> + }
> + }
> +}
> 
> 15a.
> There already exists another function called WalSndWakeup(bool physical,
> bool logical), so I think this new one should use a similar name pattern -- e.g.
> maybe like WalSndWakeupLogicalForSlotSync or ...

WalSndWakeup is a general function for both physical and logical sender, but
our new function is specific to physical sender which is more similar to
PhysicalConfirmReceivedLocation/ PhysicalReplicationSlotNewXmin, so I think the
current name is ok.

> 
> ~
> 
> 15b.
> IIRC there are some new List macros you can use instead of needing to declare
> the ListCell?

Changed.

> 
> ==========
> .../utils/activity/wait_event_names.txt
> 
> 16.
> +WAIT_FOR_STANDBY_CONFIRMATION "Waiting for the WAL to be received
> by
> physical standby."
> 
> Moving the 'the' will make this more consistent with all other "Waiting for
> WAL..." names.
> 
> SUGGESTION
> Waiting for WAL to be received by the physical standby.

Changed.

> 
> ==========
> src/backend/utils/misc/guc_tables.c
> 
> 17.
> + {
> + {"standby_slot_names", PGC_SIGHUP, REPLICATION_PRIMARY,
> + gettext_noop("Lists streaming replication standby server slot "
> + "names that logical WAL sender processes will wait for."),
> + gettext_noop("Decoded changes are sent out to plugins by logical "
> + "WAL sender processes only after specified "
> + "replication slots confirm receiving WAL."), GUC_LIST_INPUT |
> + GUC_LIST_QUOTE }, &standby_slot_names, "", check_standby_slot_names,
> + assign_standby_slot_names, NULL },
> 
> The wording of the detail msg feels kind of backwards to me.
> 
> BEFORE
> Decoded changes are sent out to plugins by logical WAL sender processes
> only after specified replication slots confirm receiving WAL.
> 
> SUGGESTION
> Logical WAL sender processes will send decoded changes to plugins only after
> the specified replication slots confirm receiving WAL.

Changed.

> 
> ==========
> src/backend/utils/misc/postgresql.conf.sample
> 
> 18.
> +#standby_slot_names = '' # streaming replication standby server slot
> +names that  # logical walsender processes will wait for
> 
> I'm not sure this is the best GUC name. See the general comment #0 above in
> this post.

As discussed, I didn’t change this.

> 
> ==========
> src/include/replication/slot.h
> 
> ==========
> src/include/replication/walsender.h
> 
> ==========
> src/include/replication/walsender_private.h
> 
> ==========
> src/include/utils/guc_hooks.h
> 
> ==========
> src/test/recovery/t/006_logical_decoding.pl
> 
> 19.
> +# Pass failover=true (last-arg), it should not have any impact on advancing.
> 
> SUGGESTION
> Passing failover=true (last arg) should not have any impact on advancing.

Changed.

> 
> ==========
> .../t/040_standby_failover_slots_sync.pl
> 
> 20.
> +#
> +# | ----> standby1 (primary_slot_name = sb1_slot) # | ----> standby2
> +(primary_slot_name = sb2_slot) # primary ----- | # | ----> subscriber1
> +(failover = true) # | ----> subscriber2 (failover = false)
> 
> In the diagram, the "--->" means a mixture of physical standbys and logical
> pub/sub replication. Maybe it can be a bit clearer?
> 
> SUGGESTION
> 
> # primary (publisher)
> #
> #     (physical standbys)
> #     | ----> standby1 (primary_slot_name = sb1_slot)
> #     | ----> standby2 (primary_slot_name = sb2_slot)
> #
> #     (logical replication)
> #     | ----> subscriber1 (failover = true, slot_name = lsub1_slot)
> #     | ----> subscriber2 (failover = false, slot_name = lsub2_slot)
> 

I think one can distinguish it based on the 'standby' and 'subscriber' as well, because
'standby' normally refer to physical standby while the other refer to logical. 

> ~~~
> 
> 21.
> +# Set up is configured in such a way that the logical slot of
> +subscriber1 is # enabled failover, thus it will wait for the physical
> +slot of # standby1(sb1_slot) to catch up before sending decoded changes to
> subscriber1.
> 
> /is enabled failover/is enabled for failover/

Changed.

> 
> ~~~
> 
> 22.
> +# Create another subscriber node without enabling failover, wait for
> +sync to # complete my $subscriber2 =
> +PostgreSQL::Test::Cluster->new('subscriber2');
> +$subscriber2->init;
> +$subscriber2->start;
> +$subscriber2->safe_psql(
> + 'postgres', qq[
> + CREATE TABLE tab_int (a int PRIMARY KEY);  CREATE SUBSCRIPTION
> +regress_mysub2 CONNECTION '$publisher_connstr'
> PUBLICATION regress_mypub WITH (slot_name = lsub2_slot);
> +]);
> +
> +$subscriber1->wait_for_subscription_sync;
> +
> 
> Is this meant to wait for 'subscription2'?

Yes, fixed.

> 
> ~~~
> 
> 23.
> # Stop the standby associated with the specified physical replication slot so #
> that the logical replication slot won't receive changes until the standby #
> comes up.
> 
> Maybe this can give the values for better understanding:
> 
> SUGGESTION
> Stop the standby associated with the specified physical replication slot
> (sb1_slot) so that the logical replication slot (lsub1_slot) won't receive changes
> until the standby comes up.

Changed.

> 
> ~~~
> 
> 24.
> +# Wait for the standby that's up and running gets the data from primary
> 
> SUGGESTION
> Wait until the standby2 that's still running gets the data from the primary.
> 

Changed.

> ~~~
> 
> 25.
> +# Wait for the subscription that's up and running and is not enabled
> for failover.
> +# It gets the data from primary without waiting for any standbys.
> 
> SUGGESTION
> Wait for subscription2 to get the data from the primary. This subscription was
> not enabled for failover so it gets the data without waiting for any standbys.
> 

Changed.

> ~~~
> 
> 26.
> +# The subscription that's up and running and is enabled for failover #
> +doesn't get the data from primary and keeps waiting for the # standby
> +specified in standby_slot_names.
> 
> SUGGESTION
> The subscription1 was enabled for failover so it doesn't get the data from
> primary and keeps waiting for the standby specified in standby_slot_names
> (sb1_slot aka standby1).
> 

Changed.

> ~~~
> 
> 27.
> +# Start the standby specified in standby_slot_names and wait for it to
> +catch # up with the primary.
> 
> SUGGESTION
> Start the standby specified in standby_slot_names (sb1_slot aka
> standby1) and wait for it to catch up with the primary.
> 

Changed.

> ~~~
> 
> 28.
> +# Now that the standby specified in standby_slot_names is up and
> +running, # primary must send the decoded changes to subscription
> +enabled for failover # While the standby was down, this subscriber
> +didn't receive any data from # primary i.e. the primary didn't allow it to go
> ahead of standby.
> 
> SUGGESTION
> Now that the standby specified in standby_slot_names is up and running, the
> primary can send the decoded changes to the subscription enabled for failover
> (i.e. subscription1). While the standby was down,
> subscription1 didn't receive any data from the primary. i.e. the primary didn't
> allow it to go ahead of standby.
> 

Changed.

> ~~~
> 
> 29.
> +# Stop the standby associated with the specified physical replication
> +slot so # that the logical replication slot won't receive changes until
> +the standby # slot's restart_lsn is advanced or the slot is removed
> +from the # standby_slot_names list.
> +$primary->safe_psql('postgres', "TRUNCATE tab_int;");
> +$primary->wait_for_catchup('regress_mysub1');
> +$standby1->stop;
> 
> Isn't this fragment more like the first step of the *next* TEST instead of the last
> step of this one?
> 

Changed.

> ~~~
> 
> 30.
> +##################################################
> +# Verify that when using pg_logical_slot_get_changes to consume changes
> +from a # logical slot with failover enabled, it will also wait for the
> +slots specified # in standby_slot_names to catch up.
> +##################################################
> 
> AFAICT this test is checking only that the function cannot return while waiting
> for the stopped standby, but it doesn't seem to check that it *does* return
> when the stopped standby comes alive again.
> 

Will think about this.

> ~~~
> 
> 31.
> +$result =
> +  $subscriber1->safe_psql('postgres', "SELECT count(*) = 0 FROM
> +tab_int;"); is($result, 't',
> + "subscriber1 doesn't get data as the sb1_slot doesn't catch up");
> 
> Do you think this fragment should have a comment?

Added.

Attach the V100 patch set which addressed above comments.

Best Regards,
Hou zj

On Wed, Feb 28, 2024 at 8:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> Few comments:

Thanks for the feedback.

> ===============
> 1.
> - if (logical)
> + if (logical || !replication)
>   {
>
> Can we add a comment about connection types that require
> ALWAYS_SECURE_SEARCH_PATH_SQL?
>
> 2.
> Can we add a test case to demonstrate that the '=' operator can be
> hijacked to do different things when the slotsync worker didn't use
> ALWAYS_SECURE_SEARCH_PATH_SQL?
>

Here is the patch with new test added and improved comments.

thanks
Shveta

Attachment

v2-0001-Fixups-for-commit-93db6cbda0.patch

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

28 February 2024, 07:01:32

Hi,

On Wed, Feb 28, 2024 at 06:48:37AM +0000, Zhijie Hou (Fujitsu) wrote:
> On Wednesday, February 28, 2024 2:38 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> > On Wed, Feb 28, 2024 at 08:49:19AM +0530, Amit Kapila wrote:
> > > On Mon, Feb 26, 2024 at 9:13 AM shveta malik <shveta.malik@gmail.com>
> > wrote:
> > > >
> > > > On Fri, Feb 23, 2024 at 7:41 PM Bertrand Drouvot
> > > > <bertranddrouvot.pg@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > > > I think to set secure search path for remote connection, the
> > > > > > standard approach could be to extend the code in
> > > > > > libpqrcv_connect[1], so that we don't need to schema qualify all the
> > operators in the queries.
> > > > > >
> > > > > > And for local connection, I agree it's also needed to add a
> > > > > > SetConfigOption("search_path", "" call in the slotsync worker.
> > > > > >
> > > > > > [1]
> > > > > > libpqrcv_connect
> > > > > > ...
> > > > > >       if (logical)
> > > > > > ...
> > > > > >               res = libpqrcv_PQexec(conn->streamConn,
> > > > > >
> > > > > > ALWAYS_SECURE_SEARCH_PATH_SQL);
> > > > > >
> > > > >
> > > > > Agree, something like in the attached? (it's .txt to not disturb the CF bot).
> > > >
> > > > Thanks for the patch, changes look good. I have corporated it in the
> > > > patch which addresses the rest of your comments in [1]. I have
> > > > attached the patch as .txt
> > > >
> > >
> > > Few comments:
> > > ===============
> > > 1.
> > > - if (logical)
> > > + if (logical || !replication)
> > >   {
> > >
> > > Can we add a comment about connection types that require
> > > ALWAYS_SECURE_SEARCH_PATH_SQL?
> > 
> > Yeah, will do.
> > 
> > >
> > > 2.
> > > Can we add a test case to demonstrate that the '=' operator can be
> > > hijacked to do different things when the slotsync worker didn't use
> > > ALWAYS_SECURE_SEARCH_PATH_SQL?
> > 
> > I don't think that's good to create a test to show how to hijack an operator
> > within a background worker.
> > 
> > I had a quick look and did not find existing tests in this area around
> > ALWAYS_SECURE_SEARCH_PATH_SQL / search_patch and background worker.
> 
> I think a similar commit 11da970 has added a test for the search_path, e.g.

Oh right, thanks for sharing!

But do we think it's worth to show how to hijack an operator within a background
worker "just" to verify that the search_path works as expected?

I don't think it's worth it but will do if others have different opinions.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

28 February 2024, 08:03:38

Hi,

On Wed, Feb 28, 2024 at 12:29:01PM +0530, shveta malik wrote:
> On Wed, Feb 28, 2024 at 8:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > Few comments:
> 
> Thanks for the feedback.
> 
> > ===============
> > 1.
> > - if (logical)
> > + if (logical || !replication)
> >   {
> >
> > Can we add a comment about connection types that require
> > ALWAYS_SECURE_SEARCH_PATH_SQL?
> >
> > 2.
> > Can we add a test case to demonstrate that the '=' operator can be
> > hijacked to do different things when the slotsync worker didn't use
> > ALWAYS_SECURE_SEARCH_PATH_SQL?
> >
> 
> Here is the patch with new test added and improved comments.

Thanks!

A few comments:

1 ===

+        * used to run normal SQL queries

s/run normal SQL/run SQL/ ?

As mentioned up-thread I don't like that much the idea of creating such a test
but if we do then here are my comments:

2 ===

+CREATE FUNCTION myschema.myintne(bigint, int)

Should we explain why 'bigint, int' is important here (instead of
'int, int')?

3 ===

+# stage of syncing newly created slots. If the worker was not prepared
+# to handle such attacks, it would have failed during

Worth to mention the underlying check / function that would get an "unexpected"
result?

Except for the above (nit) comments the patch looks good to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

28 February 2024, 08:39:13

On Wed, Feb 28, 2024 at 12:31 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Wed, Feb 28, 2024 at 06:48:37AM +0000, Zhijie Hou (Fujitsu) wrote:
> > On Wednesday, February 28, 2024 2:38 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> > > > 2.
> > > > Can we add a test case to demonstrate that the '=' operator can be
> > > > hijacked to do different things when the slotsync worker didn't use
> > > > ALWAYS_SECURE_SEARCH_PATH_SQL?
> > >
> > > I don't think that's good to create a test to show how to hijack an operator
> > > within a background worker.
> > >
> > > I had a quick look and did not find existing tests in this area around
> > > ALWAYS_SECURE_SEARCH_PATH_SQL / search_patch and background worker.
> >
> > I think a similar commit 11da970 has added a test for the search_path, e.g.
>
> Oh right, thanks for sharing!
>
> But do we think it's worth to show how to hijack an operator within a background
> worker "just" to verify that the search_path works as expected?
>
> I don't think it's worth it but will do if others have different opinions.
>

I think it is important to add this test because if we break this
behavior for any reason it will be a security hazard. Now, if adding
it increases the timing of the test too much then we should rethink
but otherwise, I don't see any reason not to add this test.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

28 February 2024, 09:56:22

On Wed, Feb 28, 2024 at 1:33 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
> A few comments:

Thanks for reviewing.

>
> 1 ===
>
> +        * used to run normal SQL queries
>
> s/run normal SQL/run SQL/ ?
>
> As mentioned up-thread I don't like that much the idea of creating such a test
> but if we do then here are my comments:
>
> 2 ===
>
> +CREATE FUNCTION myschema.myintne(bigint, int)
>
> Should we explain why 'bigint, int' is important here (instead of
> 'int, int')?
>
> 3 ===
>
> +# stage of syncing newly created slots. If the worker was not prepared
> +# to handle such attacks, it would have failed during
>
> Worth to mention the underlying check / function that would get an "unexpected"
> result?
>
> Except for the above (nit) comments the patch looks good to me.

Here is the patch which addresses the above comments. Also optimized
the test a little bit. Now we use pg_sync_replication_slots() function
instead of worker to test the operator-redirection using search-patch.
This has been done to simplify the test case and reduce the added
time.

thanks
Shveta

Attachment

v3-0001-Fixups-for-commit-93db6cbda0.patch

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

28 February 2024, 11:20:55

On Wed, Feb 28, 2024 at 3:26 PM shveta malik <shveta.malik@gmail.com> wrote:
>
>
> Here is the patch which addresses the above comments. Also optimized
> the test a little bit. Now we use pg_sync_replication_slots() function
> instead of worker to test the operator-redirection using search-patch.
> This has been done to simplify the test case and reduce the added
> time.
>

I have slightly adjusted the comments in the attached, otherwise, LGTM.

--
With Regards,
Amit Kapila.

Attachment

v4-0001-Fixups-for-commit-93db6cbda0.patch

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

28 February 2024, 11:35:33

Hi,

On Wed, Feb 28, 2024 at 02:23:27AM +0000, Zhijie Hou (Fujitsu) wrote:
> Attach the V100 patch set which addressed above comments.

Thanks!

A few random comments:

1 ===

+       if (!ok)
+       {
+               GUC_check_errdetail("List syntax is invalid.");
+       }

What about to get rid of the brackets here?

2 ===

+
+       /*
+        * If the replication slots' data have been initialized, verify if the
+        * specified slots exist and are logical slots.
+        */

remove the empty line above the comment?

3 ===

+check_standby_slot_names(char **newval, void **extra, GucSource source)
+{
+       if ((*newval)[0] == '\0')
+               return true;

I think "**newval == '\0'" is easier to read but that's a matter of taste and
check_synchronous_standby_names() is already using the same so it's a nit.

4 ===

Regarding the test, what about adding one to test the "new" behavior discussed
up-thread? (logical replication will wait if slot mentioned in standby_slot_names
is dropped and/or does not exist when the engine starts?)

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

28 February 2024, 11:44:59

Hi,

On Wed, Feb 28, 2024 at 04:50:55PM +0530, Amit Kapila wrote:
> On Wed, Feb 28, 2024 at 3:26 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> >
> > Here is the patch which addresses the above comments. Also optimized
> > the test a little bit. Now we use pg_sync_replication_slots() function
> > instead of worker to test the operator-redirection using search-patch.
> > This has been done to simplify the test case and reduce the added
> > time.

Thanks!

> I have slightly adjusted the comments in the attached, otherwise, LGTM.

Same here, LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Peter Smith

Date:

28 February 2024, 23:43:07

On Wed, Feb 28, 2024 at 10:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Feb 28, 2024 at 3:26 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> >
> > Here is the patch which addresses the above comments. Also optimized
> > the test a little bit. Now we use pg_sync_replication_slots() function
> > instead of worker to test the operator-redirection using search-patch.
> > This has been done to simplify the test case and reduce the added
> > time.
> >
>
> I have slightly adjusted the comments in the attached, otherwise, LGTM.
>
> --

- if (logical)
+ /*
+ * Set always-secure search path for the cases where the connection is
+ * used to run SQL queries, so malicious users can't get control.
+ */
+ if (logical || !replication)
  {
  PGresult   *res;

I found this condition a bit confusing. According to the
libpqrcv_connect function comment:

 * This function can be used for both replication and regular connections.
 * If it is a replication connection, it could be either logical or physical
 * based on input argument 'logical'.

IIUC that comment is saying the 'replication' flag is like the main
categorization and the 'logical' flag is like a subcategory (for when
'replication' is true). Therefore, won't the modified check be better
to be written the other way around? This will also be consistent with
the way the Assert was written.

SUGGESTION
if (!replication || logical)
{
  ...

======
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

29 February 2024, 01:34:31

On Monday, February 26, 2024 7:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Mon, Feb 26, 2024 at 7:49 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > Attach the V98 patch set which addressed above comments.
> >
> 
> Few comments:
> =============
> 1.
>  WalSndWaitForWal(XLogRecPtr loc)
>  {
>   int wakeEvents;
> + bool wait_for_standby = false;
> + uint32 wait_event;
> + List    *standby_slots = NIL;
>   static XLogRecPtr RecentFlushPtr = InvalidXLogRecPtr;
> 
> + if (MyReplicationSlot->data.failover && replication_active)
> + standby_slots = GetStandbySlotList(true);
> +
>   /*
> - * Fast path to avoid acquiring the spinlock in case we already know we
> - * have enough WAL available. This is particularly interesting if we're
> - * far behind.
> + * Check if all the standby servers have confirmed receipt of WAL up to
> + * RecentFlushPtr even when we already know we have enough WAL available.
> + *
> + * Note that we cannot directly return without checking the status of
> + * standby servers because the standby_slot_names may have changed,
> + which
> + * means there could be new standby slots in the list that have not yet
> + * caught up to the RecentFlushPtr.
>   */
> - if (RecentFlushPtr != InvalidXLogRecPtr &&
> - loc <= RecentFlushPtr)
> - return RecentFlushPtr;
> + if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr) {
> + FilterStandbySlots(RecentFlushPtr, &standby_slots);
> 
> I think even if the slot list is not changed, we will always process each slot
> mentioned in standby_slot_names once. Can't we cache the previous list of
> slots for we have already waited for? In that case, we won't even need to copy
> the list via GetStandbySlotList() unless we need to wait.
> 
> 2.
>  WalSndWaitForWal(XLogRecPtr loc)
>  {
> + /*
> + * Update the standby slots that have not yet caught up to the flushed
> + * position. It is good to wait up to RecentFlushPtr and then let it
> + * send the changes to logical subscribers one by one which are
> + * already covered in RecentFlushPtr without needing to wait on every
> + * change for standby confirmation.
> + */
> + if (wait_for_standby)
> + FilterStandbySlots(RecentFlushPtr, &standby_slots);
> +
>   /* Update our idea of the currently flushed position. */
> - if (!RecoveryInProgress())
> + else if (!RecoveryInProgress())
>   RecentFlushPtr = GetFlushRecPtr(NULL);
>   else
>   RecentFlushPtr = GetXLogReplayRecPtr(NULL); ...
> /*
> * If postmaster asked us to stop, don't wait anymore.
> *
> * It's important to do this check after the recomputation of
> * RecentFlushPtr, so we can send all remaining data before shutting
> * down.
> */
> if (got_STOPPING)
> break;
> 
> I think because 'wait_for_standby' may not be set in the first or consecutive
> cycles we may send the WAL to the logical subscriber before sending it to the
> physical subscriber during shutdown.

Here is the v101 patch set which addressed above comments.

This version will cache the oldest standby slot's LSN each time we waited for
them to catch up. The cached LSN is invalidated when we reload the GUC config.
In the WalSndWaitForWal function, instead of traversing the entire standby list
each time, we can check the cached LSN to quickly determine if the standbys
have caught up. When a shutdown signal is received, we continue to wait for the
standby slots to catch up. When waiting for the standbys to catch up after
receiving the shutdown signal, an ERROR is reported if any slots are dropped,
invalidated, or inactive. This measure is taken to prevent the walsender from
waiting indefinitely.

Best Regards,
Hou zj

On Thursday, February 29, 2024 7:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Thu, Feb 29, 2024 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > Few additional comments on the latest patch:
> > =================================
> > 1.
> >  static XLogRecPtr
> >  WalSndWaitForWal(XLogRecPtr loc)
> > {
> > ...
> > + if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr &&
> > + (!replication_active || StandbyConfirmedFlush(loc, WARNING))) {
> > + /*
> > + * Fast path to avoid acquiring the spinlock in case we already know
> > + * we have enough WAL available and all the standby servers have
> > + * confirmed receipt of WAL up to RecentFlushPtr. This is
> > + particularly
> > + * interesting if we're far behind.
> > + */
> >   return RecentFlushPtr;
> > + }
> > ...
> > ...
> > + * Wait for WALs to be flushed to disk only if the postmaster has not
> > + * asked us to stop.
> > + */
> > + if (loc > RecentFlushPtr && !got_STOPPING) wait_event =
> > + WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL;
> > +
> > + /*
> > + * Check if the standby slots have caught up to the flushed position.
> > + * It is good to wait up to RecentFlushPtr and then let it send the
> > + * changes to logical subscribers one by one which are already
> > + covered
> > + * in RecentFlushPtr without needing to wait on every change for
> > + * standby confirmation. Note that after receiving the shutdown
> > + signal,
> > + * an ERROR is reported if any slots are dropped, invalidated, or
> > + * inactive. This measure is taken to prevent the walsender from
> > + * waiting indefinitely.
> > + */
> > + else if (replication_active &&
> > + !StandbyConfirmedFlush(RecentFlushPtr, WARNING)) { wait_event =
> > + WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION;
> > + wait_for_standby = true;
> > + }
> > + else
> > + {
> > + /* Already caught up and doesn't need to wait for standby_slots. */
> >   break;
> > + }
> > ...
> > }
> >
> > Can we try to move these checks into a separate function that returns
> > a boolean and has an out parameter as wait_event?
> >
> > 2. How about naming StandbyConfirmedFlush() as
> StandbySlotsAreCaughtup?
> >
> 
> Some more comments:
> ==================
> 1.
> + foreach_ptr(char, name, elemlist)
> + {
> + ReplicationSlot *slot;
> +
> + slot = SearchNamedReplicationSlot(name, true);
> +
> + if (!slot)
> + {
> + GUC_check_errdetail("replication slot \"%s\" does not exist", name);
> + ok = false; break; }
> +
> + if (!SlotIsPhysical(slot))
> + {
> + GUC_check_errdetail("\"%s\" is not a physical replication slot",
> + name); ok = false; break; } }
> 
> Won't the second check need protection via ReplicationSlotControlLock?

Yes, added.

> 
> 2. In WaitForStandbyConfirmation(), we are anyway calling
> StandbyConfirmedFlush() before the actual sleep in the loop, so does calling it
> at the beginning of the function will serve any purpose? If so, it is better to add
> some comments explaining the same.

It is used as a fast-path to avoid calling condition variable stuff, I think we can directly
put failover check and list check in the beginning instead of calling that function.

> 
> 3. Also do we need to perform the validation checks done in
> StandbyConfirmedFlush() repeatedly when it is invoked in a loop? We can
> probably separate those checks and perform them just once.

I have removed slot.failover check from the StandbyConfirmedFlush
function, so that we can do it when necessary. I didn’t change the check
for standby_slot_names_list because the list could be changed in the loop
when reloading config.

> 
> 4.
> +   *
> +   * XXX: If needed, this can be attempted in future.
> 
> Remove this part of the comment.

Removed.

Attach the V102 patch set which addressed Amit and Shveta's comments.
Thanks Shveta for helping addressing the comments off-list.

Best Regards,
Hou zj

On Friday, March 1, 2024 10:17 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> 
> Attach the V102 patch set which addressed Amit and Shveta's comments.
> Thanks Shveta for helping addressing the comments off-list.

The cfbot reported a compile warning, here is the new version patch which fixed it,
Also removed some outdate comments in this version.

Best Regards,
Hou zj

Attachment

Re: Synchronizing slots from primary to standby

From

Peter Smith

Date:

01 March 2024, 04:22:55

Here are some review comments for v102-0001.

======
doc/src/sgml/config.sgml

1.
+       <para>
+        Lists the streaming replication standby server slot names that logical
+        WAL sender processes will wait for. Logical WAL sender processes will
+        send decoded changes to plugins only after the specified replication
+        slots confirm receiving WAL. This guarantees that logical replication
+        slots with failover enabled do not consume changes until those changes
+        are received and flushed to corresponding physical standbys. If a
+        logical replication connection is meant to switch to a physical standby
+        after the standby is promoted, the physical replication slot for the
+        standby should be listed here. Note that logical replication will not
+        proceed if the slots specified in the standby_slot_names do
not exist or
+        are invalidated.
+       </para>

Should this also mention the effect this GUC has on those 2 SQL
functions? E.g. Commit message says:

Additionally, The SQL functions pg_logical_slot_get_changes and
pg_replication_slot_advance are modified to wait for the replication
slots mentioned in 'standby_slot_names' to catch up before returning.

======
src/backend/replication/slot.c

2. validate_standby_slots

+ else if (!ReplicationSlotCtl)
+ {
+ /*
+ * We cannot validate the replication slot if the replication slots'
+ * data has not been initialized. This is ok as we will validate the
+ * specified slot when waiting for them to catch up. See
+ * StandbySlotsHaveCaughtup for details.
+ */
+ }
+ else
+ {
+ /*
+ * If the replication slots' data have been initialized, verify if the
+ * specified slots exist and are logical slots.
+ */
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);

IMO that 2nd comment does not need to say "If the replication slots'
data have been initialized," because that is implicit from the
if/else.

~~~

3. GetStandbySlotList

+List *
+GetStandbySlotList(void)
+{
+ if (RecoveryInProgress())
+ return NIL;
+ else
+ return standby_slot_names_list;
+}

The 'else' is not needed. IMO is better without it, but YMMV.

~~~

4. StandbySlotsHaveCaughtup

+/*
+ * Return true if the slots specified in standby_slot_names have caught up to
+ * the given WAL location, false otherwise.
+ *
+ * The elevel parameter determines the error level used for logging messages
+ * related to slots that do not exist, are invalidated, or are inactive.
+ */
+bool
+StandbySlotsHaveCaughtup(XLogRecPtr wait_for_lsn, int elevel)

/determines/specifies/

~

5.
+ /*
+ * Don't need to wait for the standby to catch up if there is no value in
+ * standby_slot_names.
+ */
+ if (!standby_slot_names_list)
+ return true;
+
+ /*
+ * If we are on a standby server, we do not need to wait for the standby to
+ * catch up since we do not support syncing slots to cascading standbys.
+ */
+ if (RecoveryInProgress())
+ return true;
+
+ /*
+ * Return true if all the standbys have already caught up to the passed in
+ * WAL localtion.
+ */
+ if (!XLogRecPtrIsInvalid(standby_slot_oldest_flush_lsn) &&
+ standby_slot_oldest_flush_lsn >= wait_for_lsn)
+ return true;


5a.
I felt all these comments should be worded in a consistent way like
"Don't need to wait ..."

e.g.
1. Don't need to wait for the standbys to catch up if there is no
value in 'standby_slot_names'.
2. Don't need to wait for the standbys to catch up if we are on a
standby server, since we do not support syncing slots to cascading
standbys.
3. Don't need to wait for the standbys to catch up if they are already
beyond the specified WAL location.

~

5b.
typo
/WAL localtion/WAL location/

~~~

6.
+ if (!slot)
+ {
+ /*
+ * It may happen that the slot specified in standby_slot_names GUC
+ * value is dropped, so let's skip over it.
+ */
+ ereport(elevel,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("replication slot \"%s\" specified in parameter %s does not exist",
+    name, "standby_slot_names"));
+ continue;
+ }

Is "is dropped" strictly the only reason? IIUC another reason is the
slot maybe just did not even exist in the first place but it was not
detected before now because inititial validation was skipped.

~~~

7.
+ /*
+ * Return false if not all the standbys have caught up to the specified WAL
+ * location.
+ */
+ if (caught_up_slot_num != list_length(standby_slot_names_list))
+ return false;

Somehow it seems complicated to have a counter of the slots as you
process then compare that counter to the list_length to determine if
one of them was skipped.

Probably simpler just to set a 'skipped' flag set whenever you do 'continue'...

======
src/backend/replication/walsender.c

8.
+/*
+ * Returns true if there are not enough WALs to be read, or if not all standbys
+ * have caught up to the flushed position when failover_slot is true;
+ * otherwise, returns false.
+ *
+ * Set prioritize_stop to true to skip waiting for WALs if the shutdown signal
+ * is received.
+ *
+ * Set failover_slot to true if the current acquired slot is a failover enabled
+ * slot and we are streaming.
+ *
+ * If returning true, the function sets the appropriate wait event in
+ * wait_event; otherwise, wait_event is set to 0.
+ */
+static bool
+NeedToWaitForWal(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn,
+ bool prioritize_stop, bool failover_slot,
+ uint32 *wait_event)

8a.
/Set prioritize_stop to true/Specify prioritize_stop=true/

/Set failover_slot to true/Specify failover_slot=true/

~

8b.
Aren't the static function names typically snake_case?

~~~

9.
+ /*
+ * Check if we need to wait for WALs to be flushed to disk. We don't need
+ * to wait for WALs after receiving the shutdown signal unless the
+ * wait_for_wal_on_stop is true.
+ */
+ if (target_lsn > flushed_lsn && !(prioritize_stop && got_STOPPING))
+ *wait_event = WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL;

The comment says 'wait_for_wal_on_stop' but the code says 'prioritize_stop' (??)

~~~

10.
+ /*
+ * Check if we need to wait for WALs to be flushed to disk. We don't need
+ * to wait for WALs after receiving the shutdown signal unless the
+ * wait_for_wal_on_stop is true.
+ */
+ if (target_lsn > flushed_lsn && !(prioritize_stop && got_STOPPING))
+ *wait_event = WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL;
+
+ /*
+ * Check if the standby slots have caught up to the flushed position. It is
+ * good to wait up to RecentFlushPtr and then let it send the changes to
+ * logical subscribers one by one which are already covered in
+ * RecentFlushPtr without needing to wait on every change for standby
+ * confirmation. Note that after receiving the shutdown signal, an ERROR is
+ * reported if any slots are dropped, invalidated, or inactive. This
+ * measure is taken to prevent the walsender from waiting indefinitely.
+ */
+ else if (failover_slot && !StandbySlotsHaveCaughtup(flushed_lsn, elevel))
+ *wait_event = WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION;
+ else
+ return false;
+
+ return true;

This if/else/else seems overly difficult to read. IMO it will be
easier if written like:

SUGGESTION
<comment>
if (target_lsn > flushed_lsn && !(prioritize_stop && got_STOPPING))
{
  *wait_event = WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL;
  return true;
}

<comment>
if (failover_slot && !StandbySlotsHaveCaughtup(flushed_lsn, elevel))
{
  *wait_event = WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION;
  return true;
}

return false;

----------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

01 March 2024, 04:31:41

On Fri, Mar 1, 2024 at 8:58 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Feb 28, 2024 at 4:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Feb 28, 2024 at 3:26 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > >
> > > Here is the patch which addresses the above comments. Also optimized
> > > the test a little bit. Now we use pg_sync_replication_slots() function
> > > instead of worker to test the operator-redirection using search-patch.
> > > This has been done to simplify the test case and reduce the added
> > > time.
> > >
> >
> > I have slightly adjusted the comments in the attached, otherwise, LGTM.
>
> This patch was pushed (commit: b3f6b14) and it resulted in BF failure:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-02-29%2012%3A49%3A27
>

Yeah, we forgot to allow proper authentication on Windows for
non-superusers used in the test. We need to use: "auth_extra => [
'--create-role', 'repl_role' ])" in the test. See attached. I'll do
some more testing with this and then push it.

--
With Regards,
Amit Kapila.

Attachment

fix_user_auth_slot_sync_1.patch

Re: Synchronizing slots from primary to standby

From

Masahiko Sawada

Date:

01 March 2024, 06:10:57

On Fri, Mar 1, 2024 at 12:42 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Friday, March 1, 2024 10:17 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> >
> >
> > Attach the V102 patch set which addressed Amit and Shveta's comments.
> > Thanks Shveta for helping addressing the comments off-list.
>
> The cfbot reported a compile warning, here is the new version patch which fixed it,
> Also removed some outdate comments in this version.
>

Thank you for updating the patch!

I've reviewed the v102-0001 patch. Here are some comments:

---
I got a compiler warning:

walsender.c:1829:6: warning: variable 'wait_event' is used
uninitialized whenever '&&' condition is false
[-Wsometimes-uninitialized]
        if (!XLogRecPtrIsInvalid(RecentFlushPtr) &&
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
walsender.c:1871:7: note: uninitialized use occurs here
                if (wait_event == WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL)
                    ^~~~~~~~~~
walsender.c:1829:6: note: remove the '&&' if its condition is always true
        if (!XLogRecPtrIsInvalid(RecentFlushPtr) &&
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
walsender.c:1818:20: note: initialize the variable 'wait_event' to
silence this warning
        uint32          wait_event;
                                  ^
                                   = 0
1 warning generated.

---
+void
+assign_standby_slot_names(const char *newval, void *extra)
+{
+        List      *standby_slots;
+        MemoryContext oldcxt;
+        char      *standby_slot_names_cpy = extra;
+

Given that the newval and extra have the same data (standby_slot_names
value), why do we not use newval instead? I think that if we use
newval, we don't need to guc_strdup() in check_standby_slot_names(),
we might need to do list_copy_deep() instead, though. It's not clear
to me as there is no comment.

---
+
+        standby_slot_oldest_flush_lsn = min_restart_lsn;
+

IIUC we expect that standby_slot_oldest_flush_lsn never moves
backward. If so, I think it's better to have an assertion here.

---
Resetting standby_slot_names doesn't work:

=# alter system set standby_slot_names to '';
ERROR:  invalid value for parameter "standby_slot_names": """"
DETAIL:  replication slot "" does not exist

---
+        /*
+         * Switch to the memory context under which GUC variables are allocated
+         * (GUCMemoryContext).
+         */
+        oldcxt =
MemoryContextSwitchTo(GetMemoryChunkContext(standby_slot_names_cpy));
+        standby_slot_names_list = list_copy(standby_slots);
+        MemoryContextSwitchTo(oldcxt);

Why do we not explicitly switch to GUCMemoryContext?

---
+        if (!standby_slot_names_list)
+                return true;
+

Probably "standby_slot_names_list == NIL" is more consistent with
other places. The same can be applied in WaitForStandbyConfirmation().

---
+        /*
+         * Return true if all the standbys have already caught up to
the passed in
+         * WAL localtion.
+         */
+

s/localtion/location/

---
I was a bit surprised by the fact that standby_slot_names value is
handled in a different way than a similar parameter
synchronous_standby_names. For example, the following command doesn't
work unless there is a replication slot 'slot1, slot2':

=# alter system set standby_slot_names to 'slot1, slot2';
ERROR:  invalid value for parameter "standby_slot_names": ""slot1, slot2""
DETAIL:  replication slot "slot1, slot2" does not exist

Whereas "alter system set synchronous_standby_names to stb1, stb2;"
can correctly separate the string into 'stb1' and 'stb2'.

Probably it would be okay since this behavior of standby_slot_names is
the same as other GUC parameters that accept a comma-separated string.
But I was confused a bit the first time I used it.

---
+        /*
+         * "*" is not accepted as in that case primary will not be able to know
+         * for which all standbys to wait for. Even if we have physical slots
+         * info, there is no way to confirm whether there is any standby
+         * configured for the known physical slots.
+         */
+        if (strcmp(*newval, "*") == 0)
+        {
+                GUC_check_errdetail("\"*\" is not accepted for
standby_slot_names");
+                return false;
+        }

Why only '*' is checked aside from validate_standby_slots()? I think
that the doc doesn't mention anything about '*' and '*' cannot be used
as a replication slot name. So even if we don't have this check, it
might be no problem.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Peter Smith

Date:

01 March 2024, 06:21:46

On Fri, Mar 1, 2024 at 5:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
...
> +        /*
> +         * "*" is not accepted as in that case primary will not be able to know
> +         * for which all standbys to wait for. Even if we have physical slots
> +         * info, there is no way to confirm whether there is any standby
> +         * configured for the known physical slots.
> +         */
> +        if (strcmp(*newval, "*") == 0)
> +        {
> +                GUC_check_errdetail("\"*\" is not accepted for
> standby_slot_names");
> +                return false;
> +        }
>
> Why only '*' is checked aside from validate_standby_slots()? I think
> that the doc doesn't mention anything about '*' and '*' cannot be used
> as a replication slot name. So even if we don't have this check, it
> might be no problem.
>

Hi, a while ago I asked this same question. See [1 #28] for the response..

----------
[1]
https://www.postgresql.org/message-id/OS0PR01MB571646B8186F6A06404BD50194BDA%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

01 March 2024, 06:45:12

On Fri, Mar 1, 2024 at 9:53 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some review comments for v102-0001.
>
>
> 7.
> + /*
> + * Return false if not all the standbys have caught up to the specified WAL
> + * location.
> + */
> + if (caught_up_slot_num != list_length(standby_slot_names_list))
> + return false;
>
> Somehow it seems complicated to have a counter of the slots as you
> process then compare that counter to the list_length to determine if
> one of them was skipped.
>
> Probably simpler just to set a 'skipped' flag set whenever you do 'continue'...
>

The other thing is do we need to continue when we find some slot that
can't be processed? If not, then we can simply set the boolean flag,
break the loop, and return false if the boolean is set after releasing
the LWLock. The other way is we simply release lock whenever we need
to skip the slot and just return false.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

01 March 2024, 07:15:45

Hi,

On Fri, Mar 01, 2024 at 03:22:55PM +1100, Peter Smith wrote:
> Here are some review comments for v102-0001.
> 
> ======
> doc/src/sgml/config.sgml
> 
> 1.
> +       <para>
> +        Lists the streaming replication standby server slot names that logical
> +        WAL sender processes will wait for. Logical WAL sender processes will
> +        send decoded changes to plugins only after the specified replication
> +        slots confirm receiving WAL. This guarantees that logical replication
> +        slots with failover enabled do not consume changes until those changes
> +        are received and flushed to corresponding physical standbys. If a
> +        logical replication connection is meant to switch to a physical standby
> +        after the standby is promoted, the physical replication slot for the
> +        standby should be listed here. Note that logical replication will not
> +        proceed if the slots specified in the standby_slot_names do
> not exist or
> +        are invalidated.
> +       </para>
> 
> Should this also mention the effect this GUC has on those 2 SQL
> functions? E.g. Commit message says:
> 
> Additionally, The SQL functions pg_logical_slot_get_changes and
> pg_replication_slot_advance are modified to wait for the replication
> slots mentioned in 'standby_slot_names' to catch up before returning.

I think that's also true for all the ones that rely on
pg_logical_slot_get_changes_guts(), means:

- pg_logical_slot_get_changes
- pg_logical_slot_peek_changes
- pg_logical_slot_get_binary_changes
- pg_logical_slot_peek_binary_changes

Not sure it's worth to mention the "binary" ones though as their doc mention
they behave as their "non binary" counterpart.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

01 March 2024, 07:21:18

On Friday, March 1, 2024 2:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> On Fri, Mar 1, 2024 at 12:42 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > On Friday, March 1, 2024 10:17 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> > >
> > >
> > > Attach the V102 patch set which addressed Amit and Shveta's comments.
> > > Thanks Shveta for helping addressing the comments off-list.
> >
> > The cfbot reported a compile warning, here is the new version patch
> > which fixed it, Also removed some outdate comments in this version.
> >
> 
> I've reviewed the v102-0001 patch. Here are some comments:

Thanks for the comments !

> 
> ---
> I got a compiler warning:
> 
> walsender.c:1829:6: warning: variable 'wait_event' is used uninitialized
> whenever '&&' condition is false [-Wsometimes-uninitialized]
>         if (!XLogRecPtrIsInvalid(RecentFlushPtr) &&
>             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> walsender.c:1871:7: note: uninitialized use occurs here
>                 if (wait_event ==
> WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL)
>                     ^~~~~~~~~~
> walsender.c:1829:6: note: remove the '&&' if its condition is always true
>         if (!XLogRecPtrIsInvalid(RecentFlushPtr) &&
>             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> walsender.c:1818:20: note: initialize the variable 'wait_event' to silence this
> warning
>         uint32          wait_event;
>                                   ^
>                                    = 0
> 1 warning generated.


Thanks for reporting, it was fixed in V102_2.

> 
> ---
> +void
> +assign_standby_slot_names(const char *newval, void *extra) {
> +        List      *standby_slots;
> +        MemoryContext oldcxt;
> +        char      *standby_slot_names_cpy = extra;
> +
> 
> Given that the newval and extra have the same data (standby_slot_names
> value), why do we not use newval instead? I think that if we use
> newval, we don't need to guc_strdup() in check_standby_slot_names(),
> we might need to do list_copy_deep() instead, though. It's not clear
> to me as there is no comment.

I think SplitIdentifierString will modify the passed in string, so we'd better
not pass the newval to it, otherwise the stored guc string(standby_slot_names)
will be changed. I can see we are doing similar thing in other GUC check/assign
function as well. (check_wal_consistency_checking/
assign_wal_consistency_checking, check_createrole_self_grant/
assign_createrole_self_grant ...).

> ---
> +        /*
> +         * Switch to the memory context under which GUC variables are
> allocated
> +         * (GUCMemoryContext).
> +         */
> +        oldcxt =
> MemoryContextSwitchTo(GetMemoryChunkContext(standby_slot_names_cpy
> ));
> +        standby_slot_names_list = list_copy(standby_slots);
> +        MemoryContextSwitchTo(oldcxt);
> 
> Why do we not explicitly switch to GUCMemoryContext?

I think it's because the GUCMemoryContext is not exposed.

Best Regards,
Hou zj

Re: Synchronizing slots from primary to standby

From

Ajin Cherian

Date:

01 March 2024, 09:09:23

On Thu, Feb 29, 2024 at 12:34 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:

On Monday, February 26, 2024 7:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Feb 26, 2024 at 7:49 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > Attach the V98 patch set which addressed above comments.
> >
>
> Few comments:
> =============
> 1.
> WalSndWaitForWal(XLogRecPtr loc)
> {
> int wakeEvents;
> + bool wait_for_standby = false;
> + uint32 wait_event;
> + List *standby_slots = NIL;
> static XLogRecPtr RecentFlushPtr = InvalidXLogRecPtr;
>
> + if (MyReplicationSlot->data.failover && replication_active)
> + standby_slots = GetStandbySlotList(true);
> +
> /*
> - * Fast path to avoid acquiring the spinlock in case we already know we
> - * have enough WAL available. This is particularly interesting if we're
> - * far behind.
> + * Check if all the standby servers have confirmed receipt of WAL up to
> + * RecentFlushPtr even when we already know we have enough WAL available.
> + *
> + * Note that we cannot directly return without checking the status of
> + * standby servers because the standby_slot_names may have changed,
> + which
> + * means there could be new standby slots in the list that have not yet
> + * caught up to the RecentFlushPtr.
> */
> - if (RecentFlushPtr != InvalidXLogRecPtr &&
> - loc <= RecentFlushPtr)
> - return RecentFlushPtr;
> + if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr) {
> + FilterStandbySlots(RecentFlushPtr, &standby_slots);
>
> I think even if the slot list is not changed, we will always process each slot
> mentioned in standby_slot_names once. Can't we cache the previous list of
> slots for we have already waited for? In that case, we won't even need to copy
> the list via GetStandbySlotList() unless we need to wait.
>
> 2.
> WalSndWaitForWal(XLogRecPtr loc)
> {
> + /*
> + * Update the standby slots that have not yet caught up to the flushed
> + * position. It is good to wait up to RecentFlushPtr and then let it
> + * send the changes to logical subscribers one by one which are
> + * already covered in RecentFlushPtr without needing to wait on every
> + * change for standby confirmation.
> + */
> + if (wait_for_standby)
> + FilterStandbySlots(RecentFlushPtr, &standby_slots);
> +
> /* Update our idea of the currently flushed position. */
> - if (!RecoveryInProgress())
> + else if (!RecoveryInProgress())
> RecentFlushPtr = GetFlushRecPtr(NULL);
> else
> RecentFlushPtr = GetXLogReplayRecPtr(NULL); ...
> /*
> * If postmaster asked us to stop, don't wait anymore.
> *
> * It's important to do this check after the recomputation of
> * RecentFlushPtr, so we can send all remaining data before shutting
> * down.
> */
> if (got_STOPPING)
> break;
>
> I think because 'wait_for_standby' may not be set in the first or consecutive
> cycles we may send the WAL to the logical subscriber before sending it to the
> physical subscriber during shutdown.

Here is the v101 patch set which addressed above comments.

This version will cache the oldest standby slot's LSN each time we waited for
them to catch up. The cached LSN is invalidated when we reload the GUC config.
In the WalSndWaitForWal function, instead of traversing the entire standby list
each time, we can check the cached LSN to quickly determine if the standbys
have caught up. When a shutdown signal is received, we continue to wait for the
standby slots to catch up. When waiting for the standbys to catch up after
receiving the shutdown signal, an ERROR is reported if any slots are dropped,
invalidated, or inactive. This measure is taken to prevent the walsender from
waiting indefinitely.

Thanks for the patch.
I did some performance test run on PATCH v101 with synchronous_commit turned on to check how much logical replication changes affects transaction speed on primary compared to HEAD code. In all configurations, there is a primary, a logical subscriber and a physical standby and the logical subscriber is listed in the synchronous_standby_names. This means all transactions on primary will not be committed until the logical subscriber has confirmed receipt of this transaction.

Machine details:
Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz, 800GB RAM

My addition configuration on each instance is:
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off

All tests are done using pgbench running for 15 minutes:
Creating tables: pgbench -p 6972 postgres -qis 2
Running benchmark: pgbench postgres -p 6972 -c 10 -j 3 -T 900 -P 5

HEAD code-
Primary with Synchronous_commit=on, physical standby with hot_standby_feedback=off
RUN1 (TPS) RUN2 (TPS) AVERAGE (TPS)
8.226658 8.17815 8.202404

HEAD code-
Primary with Synchronous_commit=on, physical standby with hot_standby_feedback=on
RUN1 (TPS) RUN2 (TPS) AVERAGE (TPS)
8.134901 8.229066 8.1819835 -- degradation from first config -0.25%

PATCHED code - (v101-0001)
Primary with synchronous_commit=on, physical standby with hot_standby_feedback=on, standby_slot_names not configured, logical subscriber not failover enabled, physical standby not configured for sync
RUN1 (TPS) RUN2 (TPS) AVERAGE (TPS)
8.18839 8.18839 8.18839-- degradation from first config -0.17%

PATCHED code - (v98-0001)
Synchronous_commit=on, hot_standby_feedback=on, standby_slot_names configured to physical standby, logical subscriber failover enabled, physical standby configured for sync
RUN1 (TPS) RUN2 (TPS) AVERAGE (TPS)
8.173062 8.068536 8.120799-- degradation from first config -0.99%

Overall, I do not see any significant performance degradation with the patch and sync-slot enabled with one logical subscriber and one physical standby.

Attaching script for my final test configuration for reference.

regards,

Ajin Cherian

Fujitsu Australia

Attachment

patch_run_sync.zip

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

01 March 2024, 09:21:42

On Fri, Mar 1, 2024 at 11:41 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Mar 1, 2024 at 12:42 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> ---
> I was a bit surprised by the fact that standby_slot_names value is
> handled in a different way than a similar parameter
> synchronous_standby_names. For example, the following command doesn't
> work unless there is a replication slot 'slot1, slot2':
>
> =# alter system set standby_slot_names to 'slot1, slot2';
> ERROR:  invalid value for parameter "standby_slot_names": ""slot1, slot2""
> DETAIL:  replication slot "slot1, slot2" does not exist
>
> Whereas "alter system set synchronous_standby_names to stb1, stb2;"
> can correctly separate the string into 'stb1' and 'stb2'.
>
> Probably it would be okay since this behavior of standby_slot_names is
> the same as other GUC parameters that accept a comma-separated string.
> But I was confused a bit the first time I used it.
>

I think it is better to keep the behavior in this regard the same as
'synchronous_standby_names' because both have similarities w.r.t
replication.

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

02 March 2024, 03:51:18

On Friday, March 1, 2024 12:23 PM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> Here are some review comments for v102-0001.
> 
> ======
> doc/src/sgml/config.sgml
> 
> 1.
> +       <para>
> +        Lists the streaming replication standby server slot names that logical
> +        WAL sender processes will wait for. Logical WAL sender processes will
> +        send decoded changes to plugins only after the specified replication
> +        slots confirm receiving WAL. This guarantees that logical replication
> +        slots with failover enabled do not consume changes until those
> changes
> +        are received and flushed to corresponding physical standbys. If a
> +        logical replication connection is meant to switch to a physical standby
> +        after the standby is promoted, the physical replication slot for the
> +        standby should be listed here. Note that logical replication will not
> +        proceed if the slots specified in the standby_slot_names do
> not exist or
> +        are invalidated.
> +       </para>
> 
> Should this also mention the effect this GUC has on those 2 SQL functions? E.g.
> Commit message says:
> 
> Additionally, The SQL functions pg_logical_slot_get_changes and
> pg_replication_slot_advance are modified to wait for the replication slots
> mentioned in 'standby_slot_names' to catch up before returning.

Added.

> 
> ======
> src/backend/replication/slot.c
> 
> 2. validate_standby_slots
> 
> + else if (!ReplicationSlotCtl)
> + {
> + /*
> + * We cannot validate the replication slot if the replication slots'
> + * data has not been initialized. This is ok as we will validate the
> + * specified slot when waiting for them to catch up. See
> + * StandbySlotsHaveCaughtup for details.
> + */
> + }
> + else
> + {
> + /*
> + * If the replication slots' data have been initialized, verify if the
> + * specified slots exist and are logical slots.
> + */
> + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> 
> IMO that 2nd comment does not need to say "If the replication slots'
> data have been initialized," because that is implicit from the if/else.

Removed.

> 
> ~~~
> 
> 3. GetStandbySlotList
> 
> +List *
> +GetStandbySlotList(void)
> +{
> + if (RecoveryInProgress())
> + return NIL;
> + else
> + return standby_slot_names_list;
> +}
> 
> The 'else' is not needed. IMO is better without it, but YMMV.

Removed.

> 
> ~~~
> 
> 4. StandbySlotsHaveCaughtup
> 
> +/*
> + * Return true if the slots specified in standby_slot_names have caught
> +up to
> + * the given WAL location, false otherwise.
> + *
> + * The elevel parameter determines the error level used for logging
> +messages
> + * related to slots that do not exist, are invalidated, or are inactive.
> + */
> +bool
> +StandbySlotsHaveCaughtup(XLogRecPtr wait_for_lsn, int elevel)
> 
> /determines/specifies/

Changed.

> 
> ~
> 
> 5.
> + /*
> + * Don't need to wait for the standby to catch up if there is no value
> + in
> + * standby_slot_names.
> + */
> + if (!standby_slot_names_list)
> + return true;
> +
> + /*
> + * If we are on a standby server, we do not need to wait for the
> + standby to
> + * catch up since we do not support syncing slots to cascading standbys.
> + */
> + if (RecoveryInProgress())
> + return true;
> +
> + /*
> + * Return true if all the standbys have already caught up to the passed
> + in
> + * WAL localtion.
> + */
> + if (!XLogRecPtrIsInvalid(standby_slot_oldest_flush_lsn) &&
> + standby_slot_oldest_flush_lsn >= wait_for_lsn) return true;
> 
> 
> 5a.
> I felt all these comments should be worded in a consistent way like "Don't need
> to wait ..."
> 
> e.g.
> 1. Don't need to wait for the standbys to catch up if there is no value in
> 'standby_slot_names'.
> 2. Don't need to wait for the standbys to catch up if we are on a standby server,
> since we do not support syncing slots to cascading standbys.
> 3. Don't need to wait for the standbys to catch up if they are already beyond
> the specified WAL location.

Changed.

> 
> ~
> 
> 5b.
> typo
> /WAL localtion/WAL location/
> 

Fixed.

> ~~~
> 
> 6.
> + if (!slot)
> + {
> + /*
> + * It may happen that the slot specified in standby_slot_names GUC
> + * value is dropped, so let's skip over it.
> + */
> + ereport(elevel,
> + errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("replication slot \"%s\" specified in parameter %s does not exist",
> +    name, "standby_slot_names"));
> + continue;
> + }
> 
> Is "is dropped" strictly the only reason? IIUC another reason is the slot maybe
> just did not even exist in the first place but it was not detected before now
> because inititial validation was skipped.

Changed the comment.

> 
> ~~~
> 
> 7.
> + /*
> + * Return false if not all the standbys have caught up to the specified
> + WAL
> + * location.
> + */
> + if (caught_up_slot_num != list_length(standby_slot_names_list))
> + return false;
> 
> Somehow it seems complicated to have a counter of the slots as you process
> then compare that counter to the list_length to determine if one of them was
> skipped.
> 
> Probably simpler just to set a 'skipped' flag set whenever you do 'continue'...
> 

I prefer the current style because we need to set skipped =true in
multiple places which doesn't seem better to me.

> ======
> src/backend/replication/walsender.c
> 
> 8.
> +/*
> + * Returns true if there are not enough WALs to be read, or if not all
> +standbys
> + * have caught up to the flushed position when failover_slot is true;
> + * otherwise, returns false.
> + *
> + * Set prioritize_stop to true to skip waiting for WALs if the shutdown
> +signal
> + * is received.
> + *
> + * Set failover_slot to true if the current acquired slot is a failover
> +enabled
> + * slot and we are streaming.
> + *
> + * If returning true, the function sets the appropriate wait event in
> + * wait_event; otherwise, wait_event is set to 0.
> + */
> +static bool
> +NeedToWaitForWal(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn,  bool
> +prioritize_stop, bool failover_slot,
> + uint32 *wait_event)
> 
> 8a.
> /Set prioritize_stop to true/Specify prioritize_stop=true/
> 
> /Set failover_slot to true/Specify failover_slot=true/

This function has been refactored a bit.

> 
> ~
> 
> 8b.
> Aren't the static function names typically snake_case?

I think the current name style is more consistent with the other functions in walsender.c.

> 
> ~~~
> 
> 9.
> + /*
> + * Check if we need to wait for WALs to be flushed to disk. We don't
> + need
> + * to wait for WALs after receiving the shutdown signal unless the
> + * wait_for_wal_on_stop is true.
> + */
> + if (target_lsn > flushed_lsn && !(prioritize_stop && got_STOPPING))
> + *wait_event = WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL;
> 
> The comment says 'wait_for_wal_on_stop' but the code says 'prioritize_stop'
> (??)

This has been removed in last version.

> ~~~
> 
> 10.
> + /*
> + * Check if we need to wait for WALs to be flushed to disk. We don't
> + need
> + * to wait for WALs after receiving the shutdown signal unless the
> + * wait_for_wal_on_stop is true.
> + */
> + if (target_lsn > flushed_lsn && !(prioritize_stop && got_STOPPING))
> + *wait_event = WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL;
> +
> + /*
> + * Check if the standby slots have caught up to the flushed position.
> + It is
> + * good to wait up to RecentFlushPtr and then let it send the changes
> + to
> + * logical subscribers one by one which are already covered in
> + * RecentFlushPtr without needing to wait on every change for standby
> + * confirmation. Note that after receiving the shutdown signal, an
> + ERROR is
> + * reported if any slots are dropped, invalidated, or inactive. This
> + * measure is taken to prevent the walsender from waiting indefinitely.
> + */
> + else if (failover_slot && !StandbySlotsHaveCaughtup(flushed_lsn,
> + elevel)) *wait_event = WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION;
> + else
> + return false;
> +
> + return true;
> 
> This if/else/else seems overly difficult to read. IMO it will be easier if written
> like:
> 
> SUGGESTION
> <comment>
> if (target_lsn > flushed_lsn && !(prioritize_stop && got_STOPPING)) {
>   *wait_event = WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL;
>   return true;
> }
> 
> <comment>
> if (failover_slot && !StandbySlotsHaveCaughtup(flushed_lsn, elevel)) {
>   *wait_event = WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION;
>   return true;
> }
> 
> return false;

Changed.

Attach the V103 patch set which addressed above comments and
Sawada-san's comment[1].

Apart from the comments, the code in WalSndWaitForWal was refactored
a bit to make it neater. Thanks Shveta for helping writing the code and doc.

[1] https://www.postgresql.org/message-id/CAD21AoBhty79zHgXYMNHH1KqO2VtmjRi22QPmYP2yaHC9WFVdw%40mail.gmail.com

Best Regards,
Hou zj

Attachment

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

02 March 2024, 03:51:51

On Thursday, February 29, 2024 11:16 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
>
> On Wednesday, February 28, 2024 7:36 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > 4 ===
> >
> > Regarding the test, what about adding one to test the "new" behavior
> > discussed up-thread? (logical replication will wait if slot mentioned
> > in standby_slot_names is dropped and/or does not exist when the engine
> > starts?)
>
> Will think about this.

I think we could add a test to check the warning message for dropped slot, but
since similar wait/warning functionality has been tested, I prefer to leave
this for now and can consider it again after the main patch and tests are
stabilized considering the previous experience of BF instability with this
feature.

Best Regards,
Hou zj

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

02 March 2024, 10:54:48

On Sat, Mar 2, 2024 at 9:21 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Apart from the comments, the code in WalSndWaitForWal was refactored
> a bit to make it neater. Thanks Shveta for helping writing the code and doc.
>

A few more comments:
==================
1.
+# Wait until the primary server logs a warning indicating that it is waiting
+# for the sb1_slot to catch up.
+$primary->wait_for_log(
+ qr/replication slot \"sb1_slot\" specified in parameter
standby_slot_names does not have active_pid/,
+ $offset);

Shouldn't we wait for such a LOG even in the first test as well which
involves two standbys and two logical subscribers?

2.
+##################################################
+# Test that logical replication will wait for the user-created inactive
+# physical slot to catch up until we remove the slot from standby_slot_names.
+##################################################

I don't see anything different tested in this test from what we
already tested in the first test involving two standbys and two
logical subscribers. Can you please clarify if I am missing something?

3.
Note that after receiving the shutdown signal, an ERROR
+ * is reported if any slots are dropped, invalidated, or inactive. This
+ * measure is taken to prevent the walsender from waiting indefinitely.
+ */
+ if (NeedToWaitForStandby(target_lsn, flushed_lsn, wait_event))

Isn't this part of the comment should be moved inside NeedToWaitForStandby()?

4.
+ /*
+ * Update our idea of the currently flushed position only if we are
+ * not waiting for standbys to catch up, otherwise the standby would
+ * have to catch up to a newer WAL location in each cycle.
+ */
+ if (wait_event != WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION)
+ {

This functionality (in function WalSndWaitForWal()) seems to ensure
that we first wait for the required WAL to be flushed and then wait
for standbys. If true, we should cover that point in the comments here
or somewhere in the function WalSndWaitForWal().

Apart from this, I have made a few modifications in the comments.

--
With Regards,
Amit Kapila.

Attachment

v103-0001_amit_1.patch.txt

Re: Synchronizing slots from primary to standby

From

Peter Smith

Date:

02 March 2024, 23:46:59

On Sat, Mar 2, 2024 at 2:51 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Friday, March 1, 2024 12:23 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
...
> > ======
> > src/backend/replication/slot.c
> >
> > 2. validate_standby_slots
> >
> > + else if (!ReplicationSlotCtl)
> > + {
> > + /*
> > + * We cannot validate the replication slot if the replication slots'
> > + * data has not been initialized. This is ok as we will validate the
> > + * specified slot when waiting for them to catch up. See
> > + * StandbySlotsHaveCaughtup for details.
> > + */
> > + }
> > + else
> > + {
> > + /*
> > + * If the replication slots' data have been initialized, verify if the
> > + * specified slots exist and are logical slots.
> > + */
> > + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> >
> > IMO that 2nd comment does not need to say "If the replication slots'
> > data have been initialized," because that is implicit from the if/else.
>
> Removed.

I only meant to suggest removing the 1st part, not the entire comment.
I thought it is still useful to have a comment like:

/* Check that the specified slots exist and are logical slots.*/

======

And, here are some review comments for v103-0001.

======
Commit message

1.
Additionally, The SQL functions pg_logical_slot_get_changes,
pg_logical_slot_peek_changes and pg_replication_slot_advance are modified
to wait for the replication slots mentioned in 'standby_slot_names' to
catch up before returning.

~

(use the same wording as previous in this message)

/mentioned in/specified in/

======
doc/src/sgml/config.sgml

2.
+        Additionally, when using the replication management functions
+        <link linkend="pg-replication-slot-advance">
+        <function>pg_replication_slot_advance</function></link>,
+        <link linkend="pg-logical-slot-get-changes">
+        <function>pg_logical_slot_get_changes</function></link>, and
+        <link linkend="pg-logical-slot-peek-changes">
+        <function>pg_logical_slot_peek_changes</function></link>,
+        with failover enabled logical slots, the functions will wait for the
+        physical slots specified in <varname>standby_slot_names</varname> to
+        confirm WAL receipt before proceeding.
+       </para>

It says "the ... functions" twice so maybe reword it slightly.

BEFORE
Additionally, when using the replication management functions
pg_replication_slot_advance, pg_logical_slot_get_changes, and
pg_logical_slot_peek_changes, with failover enabled logical slots, the
functions will wait for the physical slots specified in
standby_slot_names to confirm WAL receipt before proceeding.

SUGGESTION
Additionally, the replication management functions
pg_replication_slot_advance, pg_logical_slot_get_changes, and
pg_logical_slot_peek_changes, when used with failover enabled logical
slots, will wait for the physical slots specified in
standby_slot_names to confirm WAL receipt before proceeding.

(Actually the "will wait ... before proceeding" is also a bit tricky,
so below is another possible rewording)

SUGGESTION #2
Additionally, the replication management functions
pg_replication_slot_advance, pg_logical_slot_get_changes, and
pg_logical_slot_peek_changes, when used with failover enabled logical
slots, will block until all physical slots specified in
standby_slot_names have confirmed WAL receipt.

~~~

3.
+       <note>
+        <para>
+         Value <literal>*</literal> is not accepted as it is inappropriate to
+         block logical replication for physical slots that either lack
+         associated standbys or have standbys associated that are not enabled
+         for replication slot synchronization. (see
+         <xref linkend="logicaldecoding-replication-slots-synchronization"/>).
+        </para>
+       </note>

Why does the document need to provide an excuse/reason for the rule?
You could just say something like:

SUGGESTION
The slots must be named explicitly. For example, specifying wildcard
values like <literal>*</literal> is not permitted.

======
doc/src/sgml/func.sgml

4.
@@ -28150,7 +28150,7 @@ postgres=# SELECT '0/0'::pg_lsn +
pd.segment_number * ps.setting::int + :offset
       </row>

       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="pg-logical-slot-get-changes"
role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>pg_logical_slot_get_changes</primary>
         </indexterm>
@@ -28177,7 +28177,7 @@ postgres=# SELECT '0/0'::pg_lsn +
pd.segment_number * ps.setting::int + :offset
       </row>

       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="pg-logical-slot-peek-changes"
role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>pg_logical_slot_peek_changes</primary>
         </indexterm>
@@ -28232,7 +28232,7 @@ postgres=# SELECT '0/0'::pg_lsn +
pd.segment_number * ps.setting::int + :offset
       </row>

       <row>
-       <entry role="func_table_entry"><para role="func_signature">
+       <entry id="pg-replication-slot-advance"
role="func_table_entry"><para role="func_signature">
         <indexterm>
          <primary>pg_replication_slot_advance</primary>
         </indexterm>

Should these 3 functions say something about how their behaviour is
affected by 'standby_slot_names' and give a link back to the GUC
'standby_slot_names' docs?

======
src/backend/replication/slot.c

5. StandbySlotsHaveCaughtup

+ if (!slot)
+ {
+ /*
+ * If the provided slot does not exist, report a message and exit
+ * the loop. It is possible for a user to specify a slot name in
+ * standby_slot_names that does not exist just before the server
+ * startup. The GUC check_hook(validate_standby_slots) cannot
+ * validate such a slot during startup as the ReplicationSlotCtl
+ * shared memory is not initialized at that time. It is also
+ * possible for a user to drop the slot in standby_slot_names
+ * afterwards.
+ */

5a.
Minor rewording to make this code comment more similar to the next one:

SUGGESTION
If a slot name provided in standby_slot_names does not exist, report a
message and exit the loop. A user can specify a slot name that does
not exist just before the server startup...

~

5b.
+ /*
+ * If a logical slot name is provided in standby_slot_names,
+ * report a message and exit the loop. Similar to the non-existent
+ * case, it is possible for a user to specify a logical slot name
+ * in standby_slot_names before the server startup, or drop an
+ * existing physical slot and recreate a logical slot with the
+ * same name.
+ */

/it is possible for a user to specify/a user can specify/

~~~

6. WaitForStandbyConfirmation

+ /*
+ * We wait for the slots in the standby_slot_names to catch up, but we
+ * use a timeout (1s) so we can also check if the standby_slot_names
+ * has been changed.
+ */

Remove some of the "we".

SUGGESTION
Wait for the slots in the standby_slot_names to catch up, but use a
timeout (1s) so we can also check if the standby_slot_names has been
changed.

======
src/backend/replication/walsender.c

7. NeedToWaitForStandby

+/*
+ * Returns true if not all standbys have caught up to the flushed position
+ * (flushed_lsn) when failover_slot is true; otherwise, returns false.
+ *
+ * If returning true, the function sets the appropriate wait event in
+ * wait_event; otherwise, wait_event is set to 0.
+ */

The function comment refers to 'failover_slot' but IMO needs to be
worded differently because 'failover_slot' is not a parameter anymore.

~~~

8. NeedToWaitForWal

+/*
+ * Returns true if we need to wait for WALs to be flushed to disk, or if not
+ * all standbys have caught up to the flushed position (flushed_lsn) when
+ * failover_slot is true; otherwise, returns false.
+ *
+ * If returning true, the function sets the appropriate wait event in
+ * wait_event; otherwise, wait_event is set to 0.
+ */
+static bool
+NeedToWaitForWal(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn,
+ uint32 *wait_event)

Same as above -- Now that 'failover_slot' is not a function parameter,
I thought this should be reworded.

~~~

9. NeedToWaitForWal

+ /*
+ * Check if the standby slots have caught up to the flushed position. It
+ * is good to wait up to flushed position and then let it send the changes
+ * to logical subscribers one by one which are already covered in flushed
+ * position without needing to wait on every change for standby
+ * confirmation. Note that after receiving the shutdown signal, an ERROR
+ * is reported if any slots are dropped, invalidated, or inactive. This
+ * measure is taken to prevent the walsender from waiting indefinitely.
+ */
+ if (NeedToWaitForStandby(target_lsn, flushed_lsn, wait_event))
+ return true;

I felt it was confusing things for this function to also call to the
other one -- it seems an overlapping/muddling of the purpose of these.
I think it will be easier to understand if the calling code just calls
to one or both of these functions as required.

~~~

10.

- WalSndWait(wakeEvents, sleeptime, WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL);
+ WalSndWait(wakeEvents, sleeptime, wait_event);

Tracing the assignments of the 'wait_event' is a bit tricky... IIUC it
should not be 0 when we got to this point, so maybe it is better to
put Assert(wait_event) before this call?

----------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

03 March 2024, 03:56:02

On Sun, Mar 3, 2024 at 5:17 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> ~~~
>
> 3.
> +       <note>
> +        <para>
> +         Value <literal>*</literal> is not accepted as it is inappropriate to
> +         block logical replication for physical slots that either lack
> +         associated standbys or have standbys associated that are not enabled
> +         for replication slot synchronization. (see
> +         <xref linkend="logicaldecoding-replication-slots-synchronization"/>).
> +        </para>
> +       </note>
>
> Why does the document need to provide an excuse/reason for the rule?
> You could just say something like:
>
> SUGGESTION
> The slots must be named explicitly. For example, specifying wildcard
> values like <literal>*</literal> is not permitted.
>

I would like to document the reason somewhere, if not in docs, then
let's write a comment for the same in code.

> ======
>
> ~~~
>
> 9. NeedToWaitForWal
>
> + /*
> + * Check if the standby slots have caught up to the flushed position. It
> + * is good to wait up to flushed position and then let it send the changes
> + * to logical subscribers one by one which are already covered in flushed
> + * position without needing to wait on every change for standby
> + * confirmation. Note that after receiving the shutdown signal, an ERROR
> + * is reported if any slots are dropped, invalidated, or inactive. This
> + * measure is taken to prevent the walsender from waiting indefinitely.
> + */
> + if (NeedToWaitForStandby(target_lsn, flushed_lsn, wait_event))
> + return true;
>
> I felt it was confusing things for this function to also call to the
> other one -- it seems an overlapping/muddling of the purpose of these.
> I think it will be easier to understand if the calling code just calls
> to one or both of these functions as required.
>

I felt otherwise because the caller has to call these functions at
more than one place which makes the caller's code difficult to follow.
It is better to encapsulate the computation of wait_event.

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

03 March 2024, 07:51:38

On Sunday, March 3, 2024 7:47 AM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> On Sat, Mar 2, 2024 at 2:51 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > On Friday, March 1, 2024 12:23 PM Peter Smith <smithpb2250@gmail.com>
> wrote:
> > >
> ...
> > > ======
> > > src/backend/replication/slot.c
> > >
> > > 2. validate_standby_slots
> > >
> > > + else if (!ReplicationSlotCtl)
> > > + {
> > > + /*
> > > + * We cannot validate the replication slot if the replication slots'
> > > + * data has not been initialized. This is ok as we will validate
> > > + the
> > > + * specified slot when waiting for them to catch up. See
> > > + * StandbySlotsHaveCaughtup for details.
> > > + */
> > > + }
> > > + else
> > > + {
> > > + /*
> > > + * If the replication slots' data have been initialized, verify if
> > > + the
> > > + * specified slots exist and are logical slots.
> > > + */
> > > + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> > >
> > > IMO that 2nd comment does not need to say "If the replication slots'
> > > data have been initialized," because that is implicit from the if/else.
> >
> > Removed.
> 
> I only meant to suggest removing the 1st part, not the entire comment.
> I thought it is still useful to have a comment like:
> 
> /* Check that the specified slots exist and are logical slots.*/

OK, I misunderstood. Fixed.

> 
> ======
> 
> And, here are some review comments for v103-0001.

Thanks for the comments.

> 
> ======
> Commit message
> 
> 1.
> Additionally, The SQL functions pg_logical_slot_get_changes,
> pg_logical_slot_peek_changes and pg_replication_slot_advance are modified
> to wait for the replication slots mentioned in 'standby_slot_names' to catch up
> before returning.
> 
> ~
> 
> (use the same wording as previous in this message)
> 
> /mentioned in/specified in/

Changed.

> 
> ======
> doc/src/sgml/config.sgml
> 
> 2.
> +        Additionally, when using the replication management functions
> +        <link linkend="pg-replication-slot-advance">
> +        <function>pg_replication_slot_advance</function></link>,
> +        <link linkend="pg-logical-slot-get-changes">
> +        <function>pg_logical_slot_get_changes</function></link>, and
> +        <link linkend="pg-logical-slot-peek-changes">
> +        <function>pg_logical_slot_peek_changes</function></link>,
> +        with failover enabled logical slots, the functions will wait for the
> +        physical slots specified in
> <varname>standby_slot_names</varname> to
> +        confirm WAL receipt before proceeding.
> +       </para>
> 
> It says "the ... functions" twice so maybe reword it slightly.
> 
> BEFORE
> Additionally, when using the replication management functions
> pg_replication_slot_advance, pg_logical_slot_get_changes, and
> pg_logical_slot_peek_changes, with failover enabled logical slots, the functions
> will wait for the physical slots specified in standby_slot_names to confirm WAL
> receipt before proceeding.
> 
> SUGGESTION
> Additionally, the replication management functions
> pg_replication_slot_advance, pg_logical_slot_get_changes, and
> pg_logical_slot_peek_changes, when used with failover enabled logical slots,
> will wait for the physical slots specified in standby_slot_names to confirm WAL
> receipt before proceeding.
> 
> (Actually the "will wait ... before proceeding" is also a bit tricky, so below is
> another possible rewording)
> 
> SUGGESTION #2
> Additionally, the replication management functions
> pg_replication_slot_advance, pg_logical_slot_get_changes, and
> pg_logical_slot_peek_changes, when used with failover enabled logical slots,
> will block until all physical slots specified in standby_slot_names have
> confirmed WAL receipt.

I prefer the #2 version.

> 
> ~~~
> 
> 3.
> +       <note>
> +        <para>
> +         Value <literal>*</literal> is not accepted as it is inappropriate to
> +         block logical replication for physical slots that either lack
> +         associated standbys or have standbys associated that are not
> enabled
> +         for replication slot synchronization. (see
> +         <xref
> linkend="logicaldecoding-replication-slots-synchronization"/>).
> +        </para>
> +       </note>
> 
> Why does the document need to provide an excuse/reason for the rule?
> You could just say something like:
> 
> SUGGESTION
> The slots must be named explicitly. For example, specifying wildcard values like
> <literal>*</literal> is not permitted.

As suggested by Amit, I moved this to code comments.

> 
> ======
> doc/src/sgml/func.sgml
> 
> 4.
> @@ -28150,7 +28150,7 @@ postgres=# SELECT '0/0'::pg_lsn +
> pd.segment_number * ps.setting::int + :offset
>        </row>
> 
>        <row>
> -       <entry role="func_table_entry"><para role="func_signature">
> +       <entry id="pg-logical-slot-get-changes"
> role="func_table_entry"><para role="func_signature">
>          <indexterm>
>           <primary>pg_logical_slot_get_changes</primary>
>          </indexterm>
> @@ -28177,7 +28177,7 @@ postgres=# SELECT '0/0'::pg_lsn +
> pd.segment_number * ps.setting::int + :offset
>        </row>
> 
>        <row>
> -       <entry role="func_table_entry"><para role="func_signature">
> +       <entry id="pg-logical-slot-peek-changes"
> role="func_table_entry"><para role="func_signature">
>          <indexterm>
>           <primary>pg_logical_slot_peek_changes</primary>
>          </indexterm>
> @@ -28232,7 +28232,7 @@ postgres=# SELECT '0/0'::pg_lsn +
> pd.segment_number * ps.setting::int + :offset
>        </row>
> 
>        <row>
> -       <entry role="func_table_entry"><para role="func_signature">
> +       <entry id="pg-replication-slot-advance"
> role="func_table_entry"><para role="func_signature">
>          <indexterm>
>           <primary>pg_replication_slot_advance</primary>
>          </indexterm>
> 
> Should these 3 functions say something about how their behaviour is affected
> by 'standby_slot_names' and give a link back to the GUC 'standby_slot_names'
> docs?

I added the info for pg_logical_slot_get_changes() and
pg_replication_slot_advance(). The pg_logical_slot_peek_changes() function
clarifies that it behaves just like pg_logical_slot_get_changes(), so I didn’t
touch it.

> 
> ======
> src/backend/replication/slot.c
> 
> 5. StandbySlotsHaveCaughtup
> 
> + if (!slot)
> + {
> + /*
> + * If the provided slot does not exist, report a message and exit
> + * the loop. It is possible for a user to specify a slot name in
> + * standby_slot_names that does not exist just before the server
> + * startup. The GUC check_hook(validate_standby_slots) cannot
> + * validate such a slot during startup as the ReplicationSlotCtl
> + * shared memory is not initialized at that time. It is also
> + * possible for a user to drop the slot in standby_slot_names
> + * afterwards.
> + */
> 
> 5a.
> Minor rewording to make this code comment more similar to the next one:
> 
> SUGGESTION
> If a slot name provided in standby_slot_names does not exist, report a message
> and exit the loop. A user can specify a slot name that does not exist just before
> the server startup...

Changed.

> 
> ~
> 
> 5b.
> + /*
> + * If a logical slot name is provided in standby_slot_names,
> + * report a message and exit the loop. Similar to the non-existent
> + * case, it is possible for a user to specify a logical slot name
> + * in standby_slot_names before the server startup, or drop an
> + * existing physical slot and recreate a logical slot with the
> + * same name.
> + */
> 
> /it is possible for a user to specify/a user can specify/

Changed.

> 
> ~~~
> 
> 6. WaitForStandbyConfirmation
> 
> + /*
> + * We wait for the slots in the standby_slot_names to catch up, but we
> + * use a timeout (1s) so we can also check if the standby_slot_names
> + * has been changed.
> + */
> 
> Remove some of the "we".
> 
> SUGGESTION
> Wait for the slots in the standby_slot_names to catch up, but use a timeout (1s)
> so we can also check if the standby_slot_names has been changed.

Changed.

> 
> ======
> src/backend/replication/walsender.c
> 
> 7. NeedToWaitForStandby
> 
> +/*
> + * Returns true if not all standbys have caught up to the flushed
> +position
> + * (flushed_lsn) when failover_slot is true; otherwise, returns false.
> + *
> + * If returning true, the function sets the appropriate wait event in
> + * wait_event; otherwise, wait_event is set to 0.
> + */
> 
> The function comment refers to 'failover_slot' but IMO needs to be worded
> differently because 'failover_slot' is not a parameter anymore.

Changed.

> 
> ~~~
> 
> 8. NeedToWaitForWal
> 
> +/*
> + * Returns true if we need to wait for WALs to be flushed to disk, or
> +if not
> + * all standbys have caught up to the flushed position (flushed_lsn)
> +when
> + * failover_slot is true; otherwise, returns false.
> + *
> + * If returning true, the function sets the appropriate wait event in
> + * wait_event; otherwise, wait_event is set to 0.
> + */
> +static bool
> +NeedToWaitForWal(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn,
> + uint32 *wait_event)
> 
> Same as above -- Now that 'failover_slot' is not a function parameter, I thought
> this should be reworded.

Changed.

> 
> ~~~
> 
> 9. NeedToWaitForWal
> 
> + /*
> + * Check if the standby slots have caught up to the flushed position.
> + It
> + * is good to wait up to flushed position and then let it send the
> + changes
> + * to logical subscribers one by one which are already covered in
> + flushed
> + * position without needing to wait on every change for standby
> + * confirmation. Note that after receiving the shutdown signal, an
> + ERROR
> + * is reported if any slots are dropped, invalidated, or inactive. This
> + * measure is taken to prevent the walsender from waiting indefinitely.
> + */
> + if (NeedToWaitForStandby(target_lsn, flushed_lsn, wait_event)) return
> + true;
> 
> I felt it was confusing things for this function to also call to the other one -- it
> seems an overlapping/muddling of the purpose of these.
> I think it will be easier to understand if the calling code just calls to one or both
> of these functions as required.

Same as Amit, I didn't change this.

> 
> ~~~
> 
> 10.
> 
> - WalSndWait(wakeEvents, sleeptime,
> WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL);
> + WalSndWait(wakeEvents, sleeptime, wait_event);
> 
> Tracing the assignments of the 'wait_event' is a bit tricky... IIUC it should not be
> 0 when we got to this point, so maybe it is better to put Assert(wait_event)
> before this call?

Added.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

03 March 2024, 07:56:32

On Saturday, March 2, 2024 6:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Sat, Mar 2, 2024 at 9:21 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > Apart from the comments, the code in WalSndWaitForWal was refactored a
> > bit to make it neater. Thanks Shveta for helping writing the code and doc.
> >
> 
> A few more comments:

Thanks for the comments.

> ==================
> 1.
> +# Wait until the primary server logs a warning indicating that it is
> +waiting # for the sb1_slot to catch up.
> +$primary->wait_for_log(
> + qr/replication slot \"sb1_slot\" specified in parameter
> standby_slot_names does not have active_pid/,
> + $offset);
> 
> Shouldn't we wait for such a LOG even in the first test as well which involves two
> standbys and two logical subscribers?

Yes, we should. Added.

> 
> 2.
> +##################################################
> +# Test that logical replication will wait for the user-created inactive
> +# physical slot to catch up until we remove the slot from standby_slot_names.
> +##################################################
> 
> 
> I don't see anything different tested in this test from what we already tested in
> the first test involving two standbys and two logical subscribers. Can you
> please clarify if I am missing something?

I think the intention was to test that the wait loop is ended due to GUC config
reload, while the first test is for the case when the loop is ended due to
restart_lsn movement. But it seems we tested the config reload with xx_get_changes() as
well, so I can remove it if you agree.

> 
> 3.
> Note that after receiving the shutdown signal, an ERROR
> + * is reported if any slots are dropped, invalidated, or inactive. This
> + * measure is taken to prevent the walsender from waiting indefinitely.
> + */
> + if (NeedToWaitForStandby(target_lsn, flushed_lsn, wait_event))
> 
> Isn't this part of the comment should be moved inside
> NeedToWaitForStandby()?

Moved.

> 
> 4.
> + /*
> + * Update our idea of the currently flushed position only if we are
> + * not waiting for standbys to catch up, otherwise the standby would
> + * have to catch up to a newer WAL location in each cycle.
> + */
> + if (wait_event != WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION)
> + {
> 
> This functionality (in function WalSndWaitForWal()) seems to ensure that we
> first wait for the required WAL to be flushed and then wait for standbys. If true,
> we should cover that point in the comments here or somewhere in the function
> WalSndWaitForWal().
> 
> Apart from this, I have made a few modifications in the comments.

Thanks. I have reviewed and merged them.

Here is the V104 patch which addressed above and Peter's comments.

Best Regards,
Hou zj

On Mon, Mar 4, 2024 at 9:35 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> OK, if the code will remain as-is wouldn't it be better to anyway
> change the function name to indicate what it really does?
>
> e.g.  NeedToWaitForWal --> NeedToWaitForWalFlushOrStandbys
>

This seems too long. I would prefer the current name NeedToWaitForWal
as waiting for WAL means waiting to flush the WAL and waiting to
replicate it to standby. On similar lines, the variable name
standby_slot_oldest_flush_lsn looks too long. How about
ss_oldest_flush_lsn (where ss indicates standy_slots)?

Apart from this, I have made minor modifications in the attached.

--
With Regards,
Amit Kapila.

Attachment

v104-0001_Amit_1.patch.txt

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

04 March 2024, 11:22:16

Hi,

On Sun, Mar 03, 2024 at 07:56:32AM +0000, Zhijie Hou (Fujitsu) wrote:
> Here is the V104 patch which addressed above and Peter's comments.

Thanks!

A few more random comments:

1 ===

+        The function may be blocked if the specified slot is a failover enabled

s/blocked/waiting/ ?

2 ===

+                * specified slot when waiting for them to catch up. See
+                * StandbySlotsHaveCaughtup for details.

s/StandbySlotsHaveCaughtup/StandbySlotsHaveCaughtup()/ ?

3 ===

+       /* Now verify if the specified slots really exist and have correct type */

remove "really"?

4 ===

+       /*
+        * Don't need to wait for the standbys to catch up if there is no value in
+        * standby_slot_names.
+        */
+       if (standby_slot_names_list == NIL)
+               return true;
+
+       /*
+        * Don't need to wait for the standbys to catch up if we are on a standby
+        * server, since we do not support syncing slots to cascading standbys.
+        */
+       if (RecoveryInProgress())
+               return true;
+
+       /*
+        * Don't need to wait for the standbys to catch up if they are already
+        * beyond the specified WAL location.
+        */
+       if (!XLogRecPtrIsInvalid(standby_slot_oldest_flush_lsn) &&
+               standby_slot_oldest_flush_lsn >= wait_for_lsn)
+               return true;

What about using OR conditions instead?

5 ===

+static bool
+NeedToWaitForStandby(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn,
+                                        uint32 *wait_event)

Not a big deal but does it need to return a bool? (I mean it all depends of
the *wait_event value). Is it for better code readability in the caller?

6 ===

+static bool
+NeedToWaitForWal(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn,
+                                uint32 *wait_event)

Same questions as for NeedToWaitForStandby().

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

04 March 2024, 11:34:57

On Mon, Mar 4, 2024 at 4:52 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Sun, Mar 03, 2024 at 07:56:32AM +0000, Zhijie Hou (Fujitsu) wrote:
> > Here is the V104 patch which addressed above and Peter's comments.
>
> Thanks!
>
>
> 4 ===
>
> +       /*
> +        * Don't need to wait for the standbys to catch up if there is no value in
> +        * standby_slot_names.
> +        */
> +       if (standby_slot_names_list == NIL)
> +               return true;
> +
> +       /*
> +        * Don't need to wait for the standbys to catch up if we are on a standby
> +        * server, since we do not support syncing slots to cascading standbys.
> +        */
> +       if (RecoveryInProgress())
> +               return true;
> +
> +       /*
> +        * Don't need to wait for the standbys to catch up if they are already
> +        * beyond the specified WAL location.
> +        */
> +       if (!XLogRecPtrIsInvalid(standby_slot_oldest_flush_lsn) &&
> +               standby_slot_oldest_flush_lsn >= wait_for_lsn)
> +               return true;
>
> What about using OR conditions instead?
>

I think we can use but it seems code is easier to follow this way but
this is just a matter of personal choice.

> 5 ===
>
> +static bool
> +NeedToWaitForStandby(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn,
> +                                        uint32 *wait_event)
>
> Not a big deal but does it need to return a bool? (I mean it all depends of
> the *wait_event value). Is it for better code readability in the caller?
>

Yes, I think so.  Adding checks based on wait_events sounds a bit awkward.

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

04 March 2024, 13:26:13

On Monday, March 4, 2024 7:22 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Sun, Mar 03, 2024 at 07:56:32AM +0000, Zhijie Hou (Fujitsu) wrote:
> > Here is the V104 patch which addressed above and Peter's comments.
>
> Thanks!
>
> A few more random comments:

Thanks for the comments!

>
> 1 ===
>
> +        The function may be blocked if the specified slot is a failover
> + enabled
>
> s/blocked/waiting/ ?

Changed.

>
> 2 ===
>
> +                * specified slot when waiting for them to catch up. See
> +                * StandbySlotsHaveCaughtup for details.
>
> s/StandbySlotsHaveCaughtup/StandbySlotsHaveCaughtup()/ ?

Changed.

>
> 3 ===
>
> +       /* Now verify if the specified slots really exist and have
> + correct type */
>
> remove "really"?

Changed.

>
> 4 ===
>
> +       /*
> +        * Don't need to wait for the standbys to catch up if there is no value in
> +        * standby_slot_names.
> +        */
> +       if (standby_slot_names_list == NIL)
> +               return true;
> +
> +       /*
> +        * Don't need to wait for the standbys to catch up if we are on a
> standby
> +        * server, since we do not support syncing slots to cascading standbys.
> +        */
> +       if (RecoveryInProgress())
> +               return true;
> +
> +       /*
> +        * Don't need to wait for the standbys to catch up if they are already
> +        * beyond the specified WAL location.
> +        */
> +       if (!XLogRecPtrIsInvalid(standby_slot_oldest_flush_lsn) &&
> +               standby_slot_oldest_flush_lsn >= wait_for_lsn)
> +               return true;
>
> What about using OR conditions instead?
>
> 5 ===
>
> +static bool
> +NeedToWaitForStandby(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn,
> +                                        uint32 *wait_event)
>
> Not a big deal but does it need to return a bool? (I mean it all depends of the
> *wait_event value). Is it for better code readability in the caller?
>
> 6 ===
>
> +static bool
> +NeedToWaitForWal(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn,
> +                                uint32 *wait_event)
>
> Same questions as for NeedToWaitForStandby().

I also feel the current style looks a bit cleaner, so didn’t change these.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

04 March 2024, 13:26:47

On Monday, March 4, 2024 9:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> On Sun, Mar 3, 2024 at 6:51 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > On Sunday, March 3, 2024 7:47 AM Peter Smith <smithpb2250@gmail.com>
> wrote:
> > >
> 
> > > 3.
> > > +       <note>
> > > +        <para>
> > > +         Value <literal>*</literal> is not accepted as it is inappropriate to
> > > +         block logical replication for physical slots that either lack
> > > +         associated standbys or have standbys associated that are
> > > + not
> > > enabled
> > > +         for replication slot synchronization. (see
> > > +         <xref
> > > linkend="logicaldecoding-replication-slots-synchronization"/>).
> > > +        </para>
> > > +       </note>
> > >
> > > Why does the document need to provide an excuse/reason for the rule?
> > > You could just say something like:
> > >
> > > SUGGESTION
> > > The slots must be named explicitly. For example, specifying wildcard
> > > values like <literal>*</literal> is not permitted.
> >
> > As suggested by Amit, I moved this to code comments.
> 
> Was the total removal of this note deliberate? I only suggested removing the
> *reason* for the rule, not the entire rule. Otherwise, the user won't know to
> avoid doing this until they try it and get an error.

OK, Added.

> 
> 
> > >
> > > 9. NeedToWaitForWal
> > >
> > > + /*
> > > + * Check if the standby slots have caught up to the flushed position.
> > > + It
> > > + * is good to wait up to flushed position and then let it send the
> > > + changes
> > > + * to logical subscribers one by one which are already covered in
> > > + flushed
> > > + * position without needing to wait on every change for standby
> > > + * confirmation. Note that after receiving the shutdown signal, an
> > > + ERROR
> > > + * is reported if any slots are dropped, invalidated, or inactive.
> > > + This
> > > + * measure is taken to prevent the walsender from waiting indefinitely.
> > > + */
> > > + if (NeedToWaitForStandby(target_lsn, flushed_lsn, wait_event))
> > > + return true;
> > >
> > > I felt it was confusing things for this function to also call to the
> > > other one -- it seems an overlapping/muddling of the purpose of these.
> > > I think it will be easier to understand if the calling code just
> > > calls to one or both of these functions as required.
> >
> > Same as Amit, I didn't change this.
> 
> AFAICT my previous review comment was misinterpreted. Please see [1] for
> more details.
> 
> ~~~~
> 
> Here are some more review comments for v104-00001

Thanks!

> 
> ======
> Commit message
> 
> 1.
> Additionally, The SQL functions pg_logical_slot_get_changes,
> pg_logical_slot_peek_changes and pg_replication_slot_advance are modified
> to wait for the replication slots specified in 'standby_slot_names' to catch up
> before returning.
> 
> ~
> 
> Maybe that should be expressed using similar wording as the docs...
> 
> SUGGESTION
> Additionally, The SQL functions ... are modified. Now, when used with
> failover-enabled logical slots, these functions will block until all physical slots
> specified in 'standby_slot_names' have confirmed WAL receipt.

Changed.

> 
> ======
> doc/src/sgml/config.sgml
> 
> 2.
> +        <function>pg_logical_slot_peek_changes</function></link>,
> +        when used with failover enabled logical slots, will block until all
> +        physical slots specified in
> <varname>standby_slot_names</varname> have
> +        confirmed WAL receipt.
> 
> /failover enabled logical slots/failover-enabled logical slots/

Changed. Note that for this comment and remaining comments, 
I used the later version we agreed(logical failover slot).

> 
> ======
> doc/src/sgml/func.sgml
> 
> 3.
> +        The function may be blocked if the specified slot is a failover enabled
> +        slot and <link
> linkend="guc-standby-slot-names"><varname>standby_slot_names</varna
> me></link>
> +        is configured.
>         </para></entry>
> 
> /a failover enabled slot//a failover-enabled slot/

Changed.

> 
> ~~~
> 
> 4.
> +        slot may return to an earlier position. The function may be blocked if
> +        the specified slot is a failover enabled slot and
> +        <link
> linkend="guc-standby-slot-names"><varname>standby_slot_names</varna
> me></link>
> +        is configured.
> 
> /a failover enabled slot//a failover-enabled slot/

Changed.

> 
> ======
> src/backend/replication/slot.c
> 
> 5.
> +/*
> + * Wait for physical standbys to confirm receiving the given lsn.
> + *
> + * Used by logical decoding SQL functions that acquired failover enabled slot.
> + * It waits for physical standbys corresponding to the physical slots
> +specified
> + * in the standby_slot_names GUC.
> + */
> +void
> +WaitForStandbyConfirmation(XLogRecPtr wait_for_lsn)
> 
> /failover enabled slot/failover-enabled slot/

Changed.

> 
> ~~~
> 
> 6.
> + /*
> + * Don't need to wait for the standby to catch up if the current
> + acquired
> + * slot is not a failover enabled slot, or there is no value in
> + * standby_slot_names.
> + */
> 
> /failover enabled slot/failover-enabled slot/

Changed.

> 
> ======
> src/backend/replication/slotfuncs.c
> 
> 7.
> +
> + /*
> + * Wake up logical walsenders holding failover enabled slots after
> + * updating the restart_lsn of the physical slot.
> + */
> + PhysicalWakeupLogicalWalSnd();
> 
> /failover enabled slots/failover-enabled slots/

Changed.

> 
> ======
> src/backend/replication/walsender.c
> 
> 8.
> +/*
> + * Wake up the logical walsender processes with failover enabled slots
> +if the
> + * currently acquired physical slot is specified in standby_slot_names GUC.
> + */
> +void
> +PhysicalWakeupLogicalWalSnd(void)
> 
> /failover enabled slots/failover-enabled slots/

Changed.

> 
> 9.
> +/*
> + * Returns true if not all standbys have caught up to the flushed
> +position
> + * (flushed_lsn) when the current acquired slot is a failover enabled
> +logical
> + * slot and we are streaming; otherwise, returns false.
> + *
> + * If returning true, the function sets the appropriate wait event in
> + * wait_event; otherwise, wait_event is set to 0.
> + */
> +static bool
> +NeedToWaitForStandby(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn,
> + uint32 *wait_event)
> 
> 9a.
> /failover enabled logical slot/failover-enabled logical slot/

Changed.

> 
> ~
> 
> 9b.
> Probably that function name should be plural.
> 
> /NeedToWaitForStandby/NeedToWaitForStandbys/
> 
> ~~~
> 
> 10.
> +/*
> + * Returns true if we need to wait for WALs to be flushed to disk, or
> +if not
> + * all standbys have caught up to the flushed position (flushed_lsn)
> +when the
> + * current acquired slot is a failover enabled logical slot and we are
> + * streaming; otherwise, returns false.
> + *
> + * If returning true, the function sets the appropriate wait event in
> + * wait_event; otherwise, wait_event is set to 0.
> + */
> +static bool
> +NeedToWaitForWal(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn,
> + uint32 *wait_event)
> 
> /failover enabled logical slot/failover-enabled logical slot/

Changed.

> 
> ~~~
> 
> 11. WalSndWaitForWal
> 
> + /*
> + * Within the loop, we wait for the necessary WALs to be flushed to
> + * disk first, followed by waiting for standbys to catch up if there
> + * are enought WALs or upon receiving the shutdown signal. To avoid
> + * the scenario where standbys need to catch up to a newer WAL
> + * location in each iteration, we update our idea of the currently
> + * flushed position only if we are not waiting for standbys to catch
> + * up.
> + */
> 
> typo
> 
> /enought/enough/

Fixed.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

04 March 2024, 13:28:04

On Monday, March 4, 2024 5:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Mon, Mar 4, 2024 at 9:35 AM Peter Smith <smithpb2250@gmail.com>
> wrote:
> >
> > OK, if the code will remain as-is wouldn't it be better to anyway
> > change the function name to indicate what it really does?
> >
> > e.g.  NeedToWaitForWal --> NeedToWaitForWalFlushOrStandbys
> >
> 
> This seems too long. I would prefer the current name NeedToWaitForWal as
> waiting for WAL means waiting to flush the WAL and waiting to replicate it to
> standby. On similar lines, the variable name standby_slot_oldest_flush_lsn looks
> too long. How about ss_oldest_flush_lsn (where ss indicates standy_slots)?
> 
> Apart from this, I have made minor modifications in the attached.

Thanks, I have merged it.

Attach the V105 patch set which addressed Peter, Amit and Bertrand's comments.

This version also includes the following changes:
* We found a string matching issue for query_until() and fixed it.
* Removed one un-used parameter from NeedToWaitForStandbys.
* Disable the sub before testing the pg_logical_slot_get_changes in 040.pl, this is to prevent
This test from catching the warning from another walsender.
* Ran pgindent.

Best Regards,
Hou zj

On Tue, Mar 5, 2024 at 9:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Mar 5, 2024 at 6:10 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > ======
> > src/backend/replication/walsender.c
> >
> > 5. NeedToWaitForWal
> >
> > + /*
> > + * Check if the standby slots have caught up to the flushed position. It
> > + * is good to wait up to the flushed position and then let the WalSender
> > + * send the changes to logical subscribers one by one which are already
> > + * covered by the flushed position without needing to wait on every change
> > + * for standby confirmation.
> > + */
> > + if (NeedToWaitForStandbys(flushed_lsn, wait_event))
> > + return true;
> > +
> > + *wait_event = 0;
> > + return false;
> > +}
> > +
> >
> > 5a.
> > The comment (or part of it?) seems misplaced because it is talking
> > WalSender sending changes, but that is not happening in this function.
> >
>
> I don't think so. This is invoked only by walsender and a static
> function. I don't see any other better place to mention this.
>
> > Also, isn't what this is saying already described by the other comment
> > in the caller? e.g.:
> >
>
> Oh no, here we are explaining the wait order.

I think there is a scope of improvement here. The comment inside
NeedToWaitForWal() which states that we need to wait here for standbys
on flush-position(and not on each change) should be outside of this
function. It is too embedded. And the comment which states the order
of wait (first flush and then standbys confirmation) should be outside
the for-loop in WalSndWaitForWal(), but yes we do need both the
comments. Attached a patch (.txt) for comments improvement, please
merge if appropriate.

thanks
Shveta

Attachment

v2-0001-Comments-improvement.patch.txt

Re: Synchronizing slots from primary to standby

From

Nisha Moond

Date:

05 March 2024, 06:45:15

I did performance tests for the v99 patch w.r.t. wait time analysis.
As this patch is introducing a wait for standby before sending changes
to a subscriber, at the primary node, logged time at the start and end
of the XLogSendLogical() call (which eventually calls
WalSndWaitForWal()) and calculated total time taken by this function
during the load run.

For load, ran pgbench for 15 minutes:
Creating tables: pgbench -p 5833 postgres -qis 2
Running benchmark: pgbench postgres -p 5833 -c 10 -j 3 -T 900 -P 20

Machine details:
11th Gen Intel(R) Core(TM) i9-11950H @ 2.60GHz 32GB RAM
OS - Windows 10 Enterprise

Test setup:
Primary node -->
      -> One physical standby node
      -> One subscriber node having only one subscription with failover=true

-- the slot-sync relevant parameters are set to default (OFF) for all
the tests i.e.
        hot_standby_feedback = off
        sync_replication_slots = false

-- addition configuration on each instance is:
shared_buffers = 6GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off

To review the wait time impact with and without patch, compared three
cases (did two runs for each case)-
(1) HEAD code:
        time taken in run 1 = 103.935631 seconds
        time taken in run 2 = 104.832186 seconds

(2) HEAD code + v99 patch ('standby_slot_names' is not set):
        time taken in run 1 = 104.076343 seconds
        time taken in run 2 = 103.116226 seconds

(3) HEAD code + v99 patch + a valid 'standby_slot_names' is set:
        time taken in run 1 = 103.871012 seconds
        time taken in run 2 = 103.793524 seconds

The time consumption of XLogSendLogical() is almost same in all the
cases and no performance degradation is observed.

--
Thanks,
Nisha

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

05 March 2024, 07:15:29

On Monday, March 4, 2024 11:44 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Mon, Mar 04, 2024 at 01:28:04PM +0000, Zhijie Hou (Fujitsu) wrote:
> > Attach the V105 patch set
>
> Thanks!
>
> Sorry I missed those during the previous review:

No problem, thanks for the comments!

>
> 1 ===
>
> Commit message: "these functions will block until"
>
> s/block/wait/ ?
>
> 2 ===
>
> +        when used with logical failover slots, will block until all
>
> s/block/wait/ ?
>
> It seems those are the 2 remaining "block" that could deserve the proposed
> above change.

I prefer using 'block' here. And it seems others also suggest
to change the 'wait'[1].

>
> 3 ===
>
> +               invalidated = slot->data.invalidated != RS_INVAL_NONE;
> +               inactive = slot->active_pid == 0;
>
> invalidated = (slot->data.invalidated != RS_INVAL_NONE); inactive =
> (slot->active_pid == 0);
>
> instead?
>
> I think it's easier to read and it looks like this is the way it's written in other
> places (at least the few I checked).

I think the current code is consistent with other similar code in slot.c.
(grep "data.invalidated != RS_INVAL_NONE").

[1] https://www.postgresql.org/message-id/CAHut%2BPsATK8z1TEcfFE8zWoS1hagqsvaWYCgom_zYtScfwO7uQ%40mail.gmail.com

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

05 March 2024, 07:20:15

On Tuesday, March 5, 2024 8:40 AM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> Here are some review comments for v105-0001
> 
> ==========
> doc/src/sgml/config.sgml
> 
> 1.
> +       <para>
> +        The standbys corresponding to the physical replication slots in
> +        <varname>standby_slot_names</varname> must configure
> +        <literal>sync_replication_slots = true</literal> so they can receive
> +        logical failover slots changes from the primary.
> +       </para>
> 
> /slots changes/slot changes/

Changed.

> 
> ======
> doc/src/sgml/func.sgml
> 
> 2.
> +        The function may be waiting if the specified slot is a logical failover
> +        slot and <link
> linkend="guc-standby-slot-names"><varname>standby_slot_names</varna
> me></link>
> +        is configured.
> 
> I know this has been through multiple versions already, but this
> latest wording "may be waiting..." doesn't seem very good to me.
> 
> How about one of these?
> 
> * The function may not be able to return immediately if the specified
> slot is a logical failover slot and standby_slot_names is configured.
> 
> * The function return might be blocked if the specified slot is a
> logical failover slot and standby_slot_names is configured.
> 
> * If the specified slot is a logical failover slot then the function
> will block until all physical slots specified in standby_slot_names
> have confirmed WAL receipt.
> 
> * If the specified slot is a logical failover slot then the function
> will not return until all physical slots specified in
> standby_slot_names have confirmed WAL receipt.

I prefer the last one.


> 
> ~~~
> 
> 3.
> +        slot may return to an earlier position. The function may be waiting if
> +        the specified slot is a logical failover slot and
> +        <link
> linkend="guc-standby-slot-names"><varname>standby_slot_names</varna
> me></link>
> 
> 
> Same as previous review comment #2

Changed.

> 
> ======
> src/backend/replication/slot.c
> 
> 4. WaitForStandbyConfirmation
> 
> + * Used by logical decoding SQL functions that acquired logical failover slot.
> 
> IIUC it doesn't work like that. pg_logical_slot_get_changes_guts()
> calls here unconditionally (i.e. the SQL functions don't even check if
> they are failover slots before calling this) so the comment seems
> misleading/redundant.

I removed the "acquired logical failover slot.".

> 
> ======
> src/backend/replication/walsender.c
> 
> 5. NeedToWaitForWal
> 
> + /*
> + * Check if the standby slots have caught up to the flushed position. It
> + * is good to wait up to the flushed position and then let the WalSender
> + * send the changes to logical subscribers one by one which are already
> + * covered by the flushed position without needing to wait on every change
> + * for standby confirmation.
> + */
> + if (NeedToWaitForStandbys(flushed_lsn, wait_event))
> + return true;
> +
> + *wait_event = 0;
> + return false;
> +}
> +
> 
> 5a.
> The comment (or part of it?) seems misplaced because it is talking
> WalSender sending changes, but that is not happening in this function.
> 
> Also, isn't what this is saying already described by the other comment
> in the caller? e.g.:
> 
> + /*
> + * Within the loop, we wait for the necessary WALs to be flushed to
> + * disk first, followed by waiting for standbys to catch up if there
> + * are enough WALs or upon receiving the shutdown signal. To avoid the
> + * scenario where standbys need to catch up to a newer WAL location in
> + * each iteration, we update our idea of the currently flushed
> + * position only if we are not waiting for standbys to catch up.
> + */
> 

I moved these comments based on Shveta's suggestion.

> ~
> 
> 5b.
> Most of the code is unnecessary. AFAICT all this is exactly same as just 1 line:
> 
> return NeedToWaitForStandbys(flushed_lsn, wait_event);

Changed.

> 
> ~~~
> 
> 6. WalSndWaitForWal
> 
> + /*
> + * Within the loop, we wait for the necessary WALs to be flushed to
> + * disk first, followed by waiting for standbys to catch up if there
> + * are enough WALs or upon receiving the shutdown signal. To avoid the
> + * scenario where standbys need to catch up to a newer WAL location in
> + * each iteration, we update our idea of the currently flushed
> + * position only if we are not waiting for standbys to catch up.
> + */
> 
> Regarding that 1st sentence: maybe this logic used to be done
> explicitly "within the loop" but IIUC this logic is now hidden inside
> NeedToWaitForWal() so the comment should mention that.

Changed.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

05 March 2024, 07:21:24

On Tuesday, March 5, 2024 2:35 PM shveta malik <shveta.malik@gmail.com> wrote:
> 
> On Tue, Mar 5, 2024 at 9:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Mar 5, 2024 at 6:10 AM Peter Smith <smithpb2250@gmail.com>
> wrote:
> > >
> > > ======
> > > src/backend/replication/walsender.c
> > >
> > > 5. NeedToWaitForWal
> > >
> > > + /*
> > > + * Check if the standby slots have caught up to the flushed
> > > + position. It
> > > + * is good to wait up to the flushed position and then let the
> > > + WalSender
> > > + * send the changes to logical subscribers one by one which are
> > > + already
> > > + * covered by the flushed position without needing to wait on every
> > > + change
> > > + * for standby confirmation.
> > > + */
> > > + if (NeedToWaitForStandbys(flushed_lsn, wait_event)) return true;
> > > +
> > > + *wait_event = 0;
> > > + return false;
> > > +}
> > > +
> > >
> > > 5a.
> > > The comment (or part of it?) seems misplaced because it is talking
> > > WalSender sending changes, but that is not happening in this function.
> > >
> >
> > I don't think so. This is invoked only by walsender and a static
> > function. I don't see any other better place to mention this.
> >
> > > Also, isn't what this is saying already described by the other
> > > comment in the caller? e.g.:
> > >
> >
> > Oh no, here we are explaining the wait order.
> 
> I think there is a scope of improvement here. The comment inside
> NeedToWaitForWal() which states that we need to wait here for standbys on
> flush-position(and not on each change) should be outside of this function. It is
> too embedded. And the comment which states the order of wait (first flush and
> then standbys confirmation) should be outside the for-loop in
> WalSndWaitForWal(), but yes we do need both the comments. Attached a
> patch (.txt) for comments improvement, please merge if appropriate.

Thanks, I have slightly modified the top-up patch and merged it.

Attach the V106 patch which addressed above and Peter's comments[1].

[1] https://www.postgresql.org/message-id/CAHut%2BPsATK8z1TEcfFE8zWoS1hagqsvaWYCgom_zYtScfwO7uQ%40mail.gmail.com

Best Regards,
Hou zj

On Wednesday, March 6, 2024 11:04 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> 
> Hi,
> 
> > On Fri, Mar 1, 2024 at 4:21 PM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com>
> > wrote:
> > >
> > > On Friday, March 1, 2024 2:11 PM Masahiko Sawada
> > <sawada.mshk@gmail.com> wrote:
> > > >
> > > >
> > > > ---
> > > > +void
> > > > +assign_standby_slot_names(const char *newval, void *extra) {
> > > > +        List      *standby_slots;
> > > > +        MemoryContext oldcxt;
> > > > +        char      *standby_slot_names_cpy = extra;
> > > > +
> > > >
> > > > Given that the newval and extra have the same data
> > > > (standby_slot_names value), why do we not use newval instead? I
> > > > think that if we use newval, we don't need to guc_strdup() in
> > > > check_standby_slot_names(), we might need to do list_copy_deep()
> > > > instead, though. It's not clear to me as there is no comment.
> > >
> > > I think SplitIdentifierString will modify the passed in string, so
> > > we'd better not pass the newval to it, otherwise the stored guc
> > > string(standby_slot_names) will be changed. I can see we are doing
> > > similar thing in other GUC check/assign function as well.
> > > (check_wal_consistency_checking/ assign_wal_consistency_checking,
> > > check_createrole_self_grant/ assign_createrole_self_grant ...).
> >
> > Why does it have to be a List in the first place?
> 
> I thought the List type is convenient to use here, as we have existing list build
> function(SplitIdentifierString), and have convenient list macro to loop the
> list(foreach_ptr) which can save some codes.
> 
> > In earlier version patches, we
> > used to copy the list and delete the element until it became empty,
> > while waiting for physical wal senders. But we now just refer to each
> > slot name in the list. The current code assumes that
> > stnadby_slot_names_cpy is allocated in GUCMemoryContext but once it
> > changes, it will silently get broken. I think we can check and assign
> > standby_slot_names in a similar way to check/assign_temp_tablespaces
> > and check/assign_synchronous_standby_names.
> 
> Yes, we could do follow it by allocating an array and copy each slot name into it,
> but it also requires some codes to build and scan the array. So, is it possible to
> expose the GucMemorycontext or have an API like guc_copy_list instead ?
> If we don't want to touch the guc api, I am ok with using an array as well.

I rethink about this and realize that it's not good to do the memory allocation
in assign hook function. As the "src/backend/utils/misc/README" said, we'd better
do that in check hook function and pass it via extra to assign hook function. And thus
array is a good choice in this case rather than a List which cannot be passed to *extra.

Here is the V107 patch set which parse and cache the standby slot names in an
array instead of a List.

Best Regards,
Hou zj

Attachment

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

06 March 2024, 13:24:28

On Wednesday, March 6, 2024 9:13 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> On Wednesday, March 6, 2024 11:04 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada
> > <sawada.mshk@gmail.com> wrote:
> >
> > Hi,
> >
> > > On Fri, Mar 1, 2024 at 4:21 PM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com>
> > > wrote:
> > > >
> > > > On Friday, March 1, 2024 2:11 PM Masahiko Sawada
> > > <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > >
> > > > > ---
> > > > > +void
> > > > > +assign_standby_slot_names(const char *newval, void *extra) {
> > > > > +        List      *standby_slots;
> > > > > +        MemoryContext oldcxt;
> > > > > +        char      *standby_slot_names_cpy = extra;
> > > > > +
> > > > >
> > > > > Given that the newval and extra have the same data
> > > > > (standby_slot_names value), why do we not use newval instead? I
> > > > > think that if we use newval, we don't need to guc_strdup() in
> > > > > check_standby_slot_names(), we might need to do list_copy_deep()
> > > > > instead, though. It's not clear to me as there is no comment.
> > > >
> > > > I think SplitIdentifierString will modify the passed in string, so
> > > > we'd better not pass the newval to it, otherwise the stored guc
> > > > string(standby_slot_names) will be changed. I can see we are doing
> > > > similar thing in other GUC check/assign function as well.
> > > > (check_wal_consistency_checking/ assign_wal_consistency_checking,
> > > > check_createrole_self_grant/ assign_createrole_self_grant ...).
> > >
> > > Why does it have to be a List in the first place?
> >
> > I thought the List type is convenient to use here, as we have existing
> > list build function(SplitIdentifierString), and have convenient list
> > macro to loop the
> > list(foreach_ptr) which can save some codes.
> >
> > > In earlier version patches, we
> > > used to copy the list and delete the element until it became empty,
> > > while waiting for physical wal senders. But we now just refer to
> > > each slot name in the list. The current code assumes that
> > > stnadby_slot_names_cpy is allocated in GUCMemoryContext but once it
> > > changes, it will silently get broken. I think we can check and
> > > assign standby_slot_names in a similar way to
> > > check/assign_temp_tablespaces and
> check/assign_synchronous_standby_names.
> >
> > Yes, we could do follow it by allocating an array and copy each slot
> > name into it, but it also requires some codes to build and scan the
> > array. So, is it possible to expose the GucMemorycontext or have an API like
> guc_copy_list instead ?
> > If we don't want to touch the guc api, I am ok with using an array as well.
> 
> I rethink about this and realize that it's not good to do the memory allocation in
> assign hook function. As the "src/backend/utils/misc/README" said, we'd
> better do that in check hook function and pass it via extra to assign hook
> function. And thus array is a good choice in this case rather than a List which
> cannot be passed to *extra.
> 
> Here is the V107 patch set which parse and cache the standby slot names in an
> array instead of a List.

The patch needs to be rebased due to recent commit.

Attach the V107_2 path set. There are no code changes in this version.

Best Regards,
Hou zj

Attachment

Re: Synchronizing slots from primary to standby

From

Peter Smith

Date:

07 March 2024, 02:04:51

Here are some review comments for v107-0001

======
src/backend/replication/slot.c

1.
+/*
+ * Struct for the configuration of standby_slot_names.
+ *
+ * Note: this must be a flat representation that can be held in a single chunk
+ * of guc_malloc'd memory, so that it can be stored as the "extra" data for the
+ * standby_slot_names GUC.
+ */
+typedef struct
+{
+ int slot_num;
+
+ /* slot_names contains nmembers consecutive nul-terminated C strings */
+ char slot_names[FLEXIBLE_ARRAY_MEMBER];
+} StandbySlotConfigData;
+

1a.
To avoid any ambiguity this 1st field is somehow a slot ID number, I
felt a better name would be 'nslotnames' or even just 'n' or 'count',

~

1b.
(fix typo)

SUGGESTION for the 2nd field comment
slot_names is a chunk of 'n' X consecutive null-terminated C strings

~

1c.
A more explanatory name for this typedef maybe is 'StandbySlotNamesConfigData' ?

~~~


2.
+/* This is parsed and cached configuration for standby_slot_names */
+static StandbySlotConfigData *standby_slot_config;


2a.
/This is parsed and cached configuration for .../This is the parsed
and cached configuration for .../

~

2b.
Similar to above -- since this only has name information maybe it is
more correct to call it 'standby_slot_names_config'?

~~~

3.
+/*
+ * A helper function to validate slots specified in GUC standby_slot_names.
+ *
+ * The rawname will be parsed, and the parsed result will be saved into
+ * *elemlist.
+ */
+static bool
+validate_standby_slots(char *rawname, List **elemlist)

/and the parsed result/and the result/

~~~

4. check_standby_slot_names

+ /* Need a modifiable copy of string */
+ rawname = pstrdup(*newval);

/copy of string/copy of the GUC string/

~~~

5.
+assign_standby_slot_names(const char *newval, void *extra)
+{
+ /*
+ * The standby slots may have changed, so we must recompute the oldest
+ * LSN.
+ */
+ ss_oldest_flush_lsn = InvalidXLogRecPtr;
+
+ standby_slot_config = (StandbySlotConfigData *) extra;
+}

To avoid leaking don't we need to somewhere take care to free any
memory used by a previous value (if any) of this
'standby_slot_config'?

~~~

6. AcquiredStandbySlot

+/*
+ * Return true if the currently acquired slot is specified in
+ * standby_slot_names GUC; otherwise, return false.
+ */
+bool
+AcquiredStandbySlot(void)
+{
+ const char *name;
+
+ /* Return false if there is no value in standby_slot_names */
+ if (standby_slot_config == NULL)
+ return false;
+
+ name = standby_slot_config->slot_names;
+ for (int i = 0; i < standby_slot_config->slot_num; i++)
+ {
+ if (strcmp(name, NameStr(MyReplicationSlot->data.name)) == 0)
+ return true;
+
+ name += strlen(name) + 1;
+ }
+
+ return false;
+}

6a.
Just checking "(standby_slot_config == NULL)" doesn't seem enough to
me, because IIUIC it is possible when 'standby_slot_names' has no
value then maybe standby_slot_config is not NULL but
standby_slot_config->slot_num is 0.

~

6b.
IMO this function would be tidier written such that the
MyReplicationSlot->data.name is passed as a parameter. Then you can
name the function more naturally like:

IsSlotInStandbySlotNames(const char *slot_name)

~

6c.
IMO the body of the function will be tidier if written so there are
only 2 returns instead of 3 like

SUGGESTION:
if (...)
{
  for (...)
  {
     ...
return true;
  }
}
return false;

~~~

7.
+ /*
+ * Don't need to wait for the standbys to catch up if there is no value in
+ * standby_slot_names.
+ */
+ if (standby_slot_config == NULL)
+ return true;

(similar to a previous review comment)

This check doesn't seem enough because IIUIC it is possible when
'standby_slot_names' has no value then maybe standby_slot_config is
not NULL but standby_slot_config->slot_num is 0.

~~~

8. WaitForStandbyConfirmation

+ /*
+ * Don't need to wait for the standby to catch up if the current acquired
+ * slot is not a logical failover slot, or there is no value in
+ * standby_slot_names.
+ */
+ if (!MyReplicationSlot->data.failover || !standby_slot_config)
+ return;

(similar to a previous review comment)

IIUIC it is possible that when 'standby_slot_names' has no value, then
standby_slot_config is not NULL but standby_slot_config->slot_num is
0. So shouldn't that be checked too?

Perhaps it is convenient to encapsulate this check using some macro:
#define StandbySlotNamesHasNoValue() (standby_slot_config = NULL ||
standby_slot_config->slot_num == 0)

----------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

07 March 2024, 03:07:01

On Wed, Mar 6, 2024 at 6:54 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Wednesday, March 6, 2024 9:13 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> >
> > On Wednesday, March 6, 2024 11:04 AM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada
> > > <sawada.mshk@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > > On Fri, Mar 1, 2024 at 4:21 PM Zhijie Hou (Fujitsu)
> > > > <houzj.fnst@fujitsu.com>
> > > > wrote:
> > > > >
> > > > > On Friday, March 1, 2024 2:11 PM Masahiko Sawada
> > > > <sawada.mshk@gmail.com> wrote:
> > > > > >
> > > > > >
> > > > > > ---
> > > > > > +void
> > > > > > +assign_standby_slot_names(const char *newval, void *extra) {
> > > > > > +        List      *standby_slots;
> > > > > > +        MemoryContext oldcxt;
> > > > > > +        char      *standby_slot_names_cpy = extra;
> > > > > > +
> > > > > >
> > > > > > Given that the newval and extra have the same data
> > > > > > (standby_slot_names value), why do we not use newval instead? I
> > > > > > think that if we use newval, we don't need to guc_strdup() in
> > > > > > check_standby_slot_names(), we might need to do list_copy_deep()
> > > > > > instead, though. It's not clear to me as there is no comment.
> > > > >
> > > > > I think SplitIdentifierString will modify the passed in string, so
> > > > > we'd better not pass the newval to it, otherwise the stored guc
> > > > > string(standby_slot_names) will be changed. I can see we are doing
> > > > > similar thing in other GUC check/assign function as well.
> > > > > (check_wal_consistency_checking/ assign_wal_consistency_checking,
> > > > > check_createrole_self_grant/ assign_createrole_self_grant ...).
> > > >
> > > > Why does it have to be a List in the first place?
> > >
> > > I thought the List type is convenient to use here, as we have existing
> > > list build function(SplitIdentifierString), and have convenient list
> > > macro to loop the
> > > list(foreach_ptr) which can save some codes.
> > >
> > > > In earlier version patches, we
> > > > used to copy the list and delete the element until it became empty,
> > > > while waiting for physical wal senders. But we now just refer to
> > > > each slot name in the list. The current code assumes that
> > > > stnadby_slot_names_cpy is allocated in GUCMemoryContext but once it
> > > > changes, it will silently get broken. I think we can check and
> > > > assign standby_slot_names in a similar way to
> > > > check/assign_temp_tablespaces and
> > check/assign_synchronous_standby_names.
> > >
> > > Yes, we could do follow it by allocating an array and copy each slot
> > > name into it, but it also requires some codes to build and scan the
> > > array. So, is it possible to expose the GucMemorycontext or have an API like
> > guc_copy_list instead ?
> > > If we don't want to touch the guc api, I am ok with using an array as well.
> >
> > I rethink about this and realize that it's not good to do the memory allocation in
> > assign hook function. As the "src/backend/utils/misc/README" said, we'd
> > better do that in check hook function and pass it via extra to assign hook
> > function. And thus array is a good choice in this case rather than a List which
> > cannot be passed to *extra.
> >
> > Here is the V107 patch set which parse and cache the standby slot names in an
> > array instead of a List.
>
> The patch needs to be rebased due to recent commit.
>
> Attach the V107_2 path set. There are no code changes in this version.

 The patch needed to be rebased due to a recent commit. Attached
v107_3, there are no code changes in this version.

thanks
Shveta

On Thursday, March 7, 2024 12:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Thu, Mar 7, 2024 at 7:35 AM Peter Smith <smithpb2250@gmail.com>
> wrote:
> >
> > Here are some review comments for v107-0001
> >
> > ======
> > src/backend/replication/slot.c
> >
> > 1.
> > +/*
> > + * Struct for the configuration of standby_slot_names.
> > + *
> > + * Note: this must be a flat representation that can be held in a single chunk
> > + * of guc_malloc'd memory, so that it can be stored as the "extra" data for
> the
> > + * standby_slot_names GUC.
> > + */
> > +typedef struct
> > +{
> > + int slot_num;
> > +
> > + /* slot_names contains nmembers consecutive nul-terminated C strings */
> > + char slot_names[FLEXIBLE_ARRAY_MEMBER];
> > +} StandbySlotConfigData;
> > +
> >
> > 1a.
> > To avoid any ambiguity this 1st field is somehow a slot ID number, I
> > felt a better name would be 'nslotnames' or even just 'n' or 'count',
> >
> 
> We can probably just add a comment above slot_num and that should be
> sufficient but I am fine with 'nslotnames' as well, in anycase let's
> add a comment for the same.

Added.

> 
> >
> > 6b.
> > IMO this function would be tidier written such that the
> > MyReplicationSlot->data.name is passed as a parameter. Then you can
> > name the function more naturally like:
> >
> > IsSlotInStandbySlotNames(const char *slot_name)
> >
> 
> +1. How about naming it as SlotExistsinStandbySlotNames(char
> *slot_name) and pass the slot_name from MyReplicationSlot? Otherwise,
> we need an Assert for MyReplicationSlot in this function.

Changed as suggested.

> 
> Also, can we add a comment like below before the loop:
> + /*
> + * XXX: We are not expecting this list to be long so a linear search
> + * shouldn't hurt but if that turns out not to be true then we can cache
> + * this information for each WalSender as well.
> + */

Added.

Attach the V108 patch set which addressed above and Peter's comments.
I also removed the check for "*" in guc check hook.

Best Regards,
Hou zj

Attachment

v108-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

08 March 2024, 03:33:01

On Thu, Mar 7, 2024 at 12:00 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
>
> Attach the V108 patch set which addressed above and Peter's comments.
> I also removed the check for "*" in guc check hook.
>

Pushed with minor modifications. I'll keep an eye on BF.

BTW, one thing that we should try to evaluate a bit more is the
traversal of slots in StandbySlotsHaveCaughtup() where we verify if
all the slots mentioned in standby_slot_names have received the
required WAL. Even if the standby_slot_names list is short the total
number of slots can be much larger which can lead to an increase in
CPU usage during traversal. There is an optimization that allows to
cache ss_oldest_flush_lsn and ensures that we don't need to traverse
the slots each time so it may not hit frequently but still there is a
chance. I see it is possible to further optimize this area by caching
the position of each slot mentioned in standby_slot_names in
replication_slots array but not sure whether it is worth.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Ajin Cherian

Date:

08 March 2024, 04:26:35

On Fri, Mar 8, 2024 at 2:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Mar 7, 2024 at 12:00 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
>
> Attach the V108 patch set which addressed above and Peter's comments.
> I also removed the check for "*" in guc check hook.
>

Pushed with minor modifications. I'll keep an eye on BF.

BTW, one thing that we should try to evaluate a bit more is the
traversal of slots in StandbySlotsHaveCaughtup() where we verify if
all the slots mentioned in standby_slot_names have received the
required WAL. Even if the standby_slot_names list is short the total
number of slots can be much larger which can lead to an increase in
CPU usage during traversal. There is an optimization that allows to
cache ss_oldest_flush_lsn and ensures that we don't need to traverse
the slots each time so it may not hit frequently but still there is a
chance. I see it is possible to further optimize this area by caching
the position of each slot mentioned in standby_slot_names in
replication_slots array but not sure whether it is worth.

I tried to test this by configuring a large number of logical slots while making sure the standby slots are at the end of the array and checking if there was any performance hit in logical replication from these searches.

Setup:
1. 1 primary server configured with 3 servers in the standby_slot_names, 1 extra logical slot (not configured for failover) + 1 logical subscriber configures as failover + 3 physical standbys(all configured to sync logical slots)

2. 1 primary server configured with 3 servers in the standby_slot_names, 100 extra logical slot (not configured for failover) + 1 logical subscriber configures as failover + 3 physical standbys(all configured to sync logical slots)

3. 1 primary server configured with 3 servers in the standby_slot_names, 500 extra logical slot (not configured for failover) + 1 logical subscriber configures as failover + 3 physical standbys(all configured to sync logical slots)

In the three setups, 3 standby_slot_names are compared with a list of 2,101 and 501 slots respectively.

I ran a pgbench for 15 minutes for all 3 setups:

Case 1: Average TPS - 8.143399 TPS
Case 2: Average TPS - 8.187462 TPS
Case 3: Average TPS - 8.190611 TPS

I see no degradation in the performance, the differences in performance are well within the run to run variations seen.

Nisha also did some performance tests to record the lag introduced by the large number of slots traversal in StandbySlotsHaveCaughtup(). The tests logged time at the start and end of the XLogSendLogical() call (which eventually calls WalSndWaitForWal() --> StandbySlotsHaveCaughtup()) and calculated total time taken by this function during the load run for different total slots count.

Setup:
--one primary with 3 standbys and one subscriber with one active subscription
--hot_standby_feedback=off and sync_replication_slots=false
--made sure the standby slots remain at the end ReplicationSlotCtl->replication_slots array to measure performance of worst case scenario for standby slot search in StandbySlotsHaveCaughtup()

pgbench for 15 min was run. Here is the data:

Case1 : with 1 logical slot, standby_slot_names having 3 slots
Run1: 626.141642 secs
Run2: 631.930254 secs

Case2 : with 100 logical slots, standby_slot_names having 3 slots
Run1: 629.38332 secs
Run2: 630.548432 secs

Case3 : with 500 logical slots, standby_slot_names having 3 slots
Run1: 629.910829 secs
Run2: 627.924183 secs

There was no degradation in performance seen.

Thanks Nisha for helping with the testing.

regards,

Ajin Cherian

Fujitsu Australia

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

08 March 2024, 05:09:21

On Fri, Mar 8, 2024 at 9:56 AM Ajin Cherian <itsajin@gmail.com> wrote:
>
>> Pushed with minor modifications. I'll keep an eye on BF.
>>
>> BTW, one thing that we should try to evaluate a bit more is the
>> traversal of slots in StandbySlotsHaveCaughtup() where we verify if
>> all the slots mentioned in standby_slot_names have received the
>> required WAL. Even if the standby_slot_names list is short the total
>> number of slots can be much larger which can lead to an increase in
>> CPU usage during traversal. There is an optimization that allows to
>> cache ss_oldest_flush_lsn and ensures that we don't need to traverse
>> the slots each time so it may not hit frequently but still there is a
>> chance. I see it is possible to further optimize this area by caching
>> the position of each slot mentioned in standby_slot_names in
>> replication_slots array but not sure whether it is worth.
>>
>>
>
> I tried to test this by configuring a large number of logical slots while making sure the standby slots are at the
endof the array and checking if there was any performance hit in logical replication from these searches. 
>

Thanks  Ajin and Nisha.

We also plan:
1) Redoing XLogSendLogical time-log related test with
'sync_replication_slots' enabled.
2) pg_recvlogical test to monitor lag in StandbySlotsHaveCaughtup()
for a large number of slots.
3) Profiling to see if StandbySlotsHaveCaughtup() is noticeable in the
report when there are a large number of slots to traverse.

thanks
Shveta

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

13 March 2024, 07:58:52

On Friday, March 8, 2024 1:09 PM shveta malik <shveta.malik@gmail.com> wrote:
> On Fri, Mar 8, 2024 at 9:56 AM Ajin Cherian <itsajin@gmail.com> wrote:
> >
> >> Pushed with minor modifications. I'll keep an eye on BF.
> >>
> >> BTW, one thing that we should try to evaluate a bit more is the
> >> traversal of slots in StandbySlotsHaveCaughtup() where we verify if
> >> all the slots mentioned in standby_slot_names have received the
> >> required WAL. Even if the standby_slot_names list is short the total
> >> number of slots can be much larger which can lead to an increase in
> >> CPU usage during traversal. There is an optimization that allows to
> >> cache ss_oldest_flush_lsn and ensures that we don't need to traverse
> >> the slots each time so it may not hit frequently but still there is a
> >> chance. I see it is possible to further optimize this area by caching
> >> the position of each slot mentioned in standby_slot_names in
> >> replication_slots array but not sure whether it is worth.
> >>
> >>
> >
> > I tried to test this by configuring a large number of logical slots while making
> sure the standby slots are at the end of the array and checking if there was any
> performance hit in logical replication from these searches.
> >
> 

Thanks Nisha for conducting some additional tests and discussing with me
internally. We have collected the performance data on HEAD. Basically, we don't
see a noticeable difference in the performance data and StandbySlotsHaveCaughtup
also does not stand out in the profile.

Here are the details:

> 1) Redoing XLogSendLogical time-log related test with
>    'sync_replication_slots' enabled.

Setup:
- one primary + 3standbys + one subscriber with one active subscription
- ran 15 min pgbench for all cases
- hot_standby_feedback=ON and sync_replication_slots=TRUE

(To maximize the impact of SearchNamedReplicationSlot clear, the standby slot
is at the end of the ReplicationSlotCtl->replication_slots array in each test)

Case1 - 1 slot:     895.305565 secs
Case2 - 100 slots:  894.936039 secs
Case3 - 500 slots:  895.256412 secs

> 2) pg_recvlogical test to monitor lag in StandbySlotsHaveCaughtup() for a
>    large number of slots.

We reran the XLogSendLogical() wait time analysis tests.
Setup:
- One primary node and 3 standby nodes
- Created logical slots using "test_decoding" and activated one walsender by running pg_recvlogical on one slot.
- hot_standby_feedback=ON and sync_replication_slots=TRUE
- Did one run for each case with pgbench for 15 min

(To maximize the impact of SearchNamedReplicationSlot clear, the stanbys slot
is at the end of the ReplicationSlotCtl->replication_slots array in each test)

Case1 - 1 slot:     894.83775 secs
Case2 - 100 slots:  894.449356 secs
Case3 - 500 slots:  894.98479 secs

There is no noticeable regression when the number of replication slots increases.

> 3) Profiling to see if StandbySlotsHaveCaughtup() is noticeable in the report
>    when there are a large number of slots to traverse.

The setup is the same as 2). To maximize the impact of
SearchNamedReplicationSlot clear, the stanbys slot is at the end of the
ReplicationSlotCtl->replication_slots array.

The StandbySlotsHaveCaughtup is not noticeable in the profile.

0.03%     0.00%  postgres  postgres            [.] StandbySlotsHaveCaughtup

After some investigation, it appears that the cached 'ss_oldest_flush_lsn'
plays a crucial role in optimizing this workload, effectively reducing the need
for frequent strcmp operations within the loop.

To test the impact of frequent strcmp calls, we conducted a test by removing
the 'ss_oldest_flush_lsn' check and re-evaluating the profile. This time, although the
profile indicated a small increase in the StandbySlotsHaveCaughtup metric,
it still does not raise significant concerns.

--1.47%--NeedToWaitForWal
|        NeedToWaitForStandbys
|        StandbySlotsHaveCaughtup
|        |          
|         --0.96%--SearchNamedReplicationSlot

The scripts that were used to setup the test environment for all above tests are attached.
The machine configuration for above tests is as follows:
CPU : E7-4890v2(2.8Ghz/15core)×4
MEM : 768GB
HDD : 600GB×2
OS : RHEL 7.9

While no noticeable overhead was observed in the SearchNamedReplicationSlot
operation, we explored a strategy to enhance efficiency by minimizing the
search for standby slots within the loop. The idea is to cache the
position of each standby slot within ReplicationSlotCtl->replication_slots. We
will reference the slot directly through
ReplicationSlotCtl->replication_slots[index]. If the slot name matches, we will
perform other checks including the restart_lsn; otherwise,
SearchNamedReplicationSlot is invoked to update the index cache accordingly.
This optimization can reduce the cost from O(n*m) to O(n).

Note that since we didn't see the overhead in the test, I am not proposing to
push this patch now. But just share the idea and a small patch in case anyone
came across a workload where performance impact of SearchNamedReplicationSlot
becomes noticeable.

Best Regards,
Hou zj

Attachment

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

14 March 2024, 02:22:44

Hi,

Since the standby_slot_names patch has been committed, I am attaching the last
doc patch for review.

Best Regards,
Hou zj

Attachment

v109-0001-Document-the-steps-to-check-if-the-standby-is-r.patch

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

15 March 2024, 14:44:43

Hi,

On Thu, Mar 14, 2024 at 02:22:44AM +0000, Zhijie Hou (Fujitsu) wrote:
> Hi,
> 
> Since the standby_slot_names patch has been committed, I am attaching the last
> doc patch for review.
> 

Thanks!

1 ===

+   continue subscribing to publications now on the new primary server without
+   any data loss.

I think "without any data loss" should be re-worded in this context. Data loss
in the sense "data committed on the primary and not visible on the subscriber in
case of failover" can still occurs (in case synchronous replication is not used).

2 ===

+   If the result (<literal>failover_ready</literal>) of both above steps is
+   true, existing subscriptions will be able to continue without data loss.
+  </para>

I don't think that's true if synchronous replication is not used. Say,

- synchronous replication is not used
- primary is not able to reach the standby anymore and standby_slot_names is set
- new data is inserted into the primary
- then not replicated to subscriber (due to standby_slot_names)

Then I think the both above steps will return true but data would be lost in
case of failover.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

28 March 2024, 04:38:19

Hi,

When analyzing one BF error[1], we find an issue of slotsync: Since we don't
perform logical decoding for the synced slots when syncing the lsn/xmin of
slot, no logical snapshots will be serialized to disk. So, when user starts to
use these synced slots after promotion, it needs to re-build the consistent
snapshot from the restart_lsn if the WAL(xl_running_xacts) at restart_lsn
position indicates that there are running transactions. This however could
cause the data that before the consistent point to be missed[2].

This issue doesn't exist on the primary because the snapshot at restart_lsn
should have been serialized to disk (SnapBuildProcessRunningXacts ->
SnapBuildSerialize), so even if the logical decoding restarts, it can find
consistent snapshot immediately at restart_lsn.

To fix this, we could use the fast forward logical decoding to advance the synced
slot's lsn/xmin when syncing these values instead of directly updating the
slot's info. This way, the snapshot will be serialized to disk when decoding.
If we could not reach to the consistent point at the remote restart_lsn, the
slot is marked as temp and will be persisted once it reaches the consistent
point. I am still analyzing the fix and will share once ready.


[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2024-03-19%2010%3A03%3A06
[2] The steps to reproduce the data miss issue on a primary->standby setup:

Note, we need to set LOG_SNAPSHOT_INTERVAL_MS to a bigger number(1500000) to
prevent cocurrent LogStandbySnapshot() call and enable sync_replication_slots on the standby.

1. Create a failover logical slot on the primary.
SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot', 'test_decoding', false, false, true);

2. Use the following steps to advance the restart_lsn of the failover slot to a
position where the xl_running_xacts at that position indicates that there is
running transaction.

TXN1
BEGIN;
create table dummy1(a int);

    TXN2
    SELECT pg_log_standby_snapshot();

TXN1
COMMIT;

TXN1
BEGIN;
create table dummy2(a int);

    TXN2
    SELECT pg_log_standby_snapshot();

TXN1
COMMIT;

-- the restart_lsn will be advanced to a position where there was 1 running
transaction. And we need to wait for the restart_lsn to be synced to the
standby.
SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn());

-- insert some data here before calling next pg_log_standby_snapshot().
INSERT INTO reptable VALUES(999);

3. Promote the standby and try to consume the change(999) from the synced slot
on the standby. We will find that no change is returned.

select * from pg_logical_slot_get_changes('logicalslot', NULL, NULL);

Best Regards,
Hou zj

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

28 March 2024, 10:04:21

Hi,

On Thu, Mar 28, 2024 at 04:38:19AM +0000, Zhijie Hou (Fujitsu) wrote:
> Hi,
> 
> When analyzing one BF error[1], we find an issue of slotsync: Since we don't
> perform logical decoding for the synced slots when syncing the lsn/xmin of
> slot, no logical snapshots will be serialized to disk. So, when user starts to
> use these synced slots after promotion, it needs to re-build the consistent
> snapshot from the restart_lsn if the WAL(xl_running_xacts) at restart_lsn
> position indicates that there are running transactions. This however could
> cause the data that before the consistent point to be missed[2].

I see, nice catch and explanation, thanks!

> This issue doesn't exist on the primary because the snapshot at restart_lsn
> should have been serialized to disk (SnapBuildProcessRunningXacts ->
> SnapBuildSerialize), so even if the logical decoding restarts, it can find
> consistent snapshot immediately at restart_lsn.

Right.

> To fix this, we could use the fast forward logical decoding to advance the synced
> slot's lsn/xmin when syncing these values instead of directly updating the
> slot's info. This way, the snapshot will be serialized to disk when decoding.
> If we could not reach to the consistent point at the remote restart_lsn, the
> slot is marked as temp and will be persisted once it reaches the consistent
> point. I am still analyzing the fix and will share once ready.

Thanks! I'm wondering about the performance impact (even in fast_forward mode),
might be worth to keep an eye on it.

Should we create a 17 open item [1]?

[1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

28 March 2024, 11:32:15

On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> When analyzing one BF error[1], we find an issue of slotsync: Since we don't
> perform logical decoding for the synced slots when syncing the lsn/xmin of
> slot, no logical snapshots will be serialized to disk. So, when user starts to
> use these synced slots after promotion, it needs to re-build the consistent
> snapshot from the restart_lsn if the WAL(xl_running_xacts) at restart_lsn
> position indicates that there are running transactions. This however could
> cause the data that before the consistent point to be missed[2].
>
> This issue doesn't exist on the primary because the snapshot at restart_lsn
> should have been serialized to disk (SnapBuildProcessRunningXacts ->
> SnapBuildSerialize), so even if the logical decoding restarts, it can find
> consistent snapshot immediately at restart_lsn.
>
> To fix this, we could use the fast forward logical decoding to advance the synced
> slot's lsn/xmin when syncing these values instead of directly updating the
> slot's info. This way, the snapshot will be serialized to disk when decoding.
> If we could not reach to the consistent point at the remote restart_lsn, the
> slot is marked as temp and will be persisted once it reaches the consistent
> point. I am still analyzing the fix and will share once ready.
>

Yes, we can use this but one thing to note is that
CreateDecodingContext() will expect that the slot's and current
database are the same. I think the reason for that is we need to check
system tables of the current database while decoding and sending data
to the output_plugin which won't be a requirement for the fast_forward
case. So, we need to skip that check in fast_forward mode.

Next, I was thinking about the case of the first time updating the
restart and confirmed_flush LSN while syncing the slots. I think we
can keep the current logic as it is based on the following analysis.

For each logical slot, cases possible on the primary:
1. The restart_lsn doesn't have a serialized snapshot and hasn't yet
reached the consistent point.
2. The restart_lsn doesn't have a serialized snapshot but has reached
a consistent point.
3. The restart_lsn has a serialized snapshot which means it has
reached a consistent point as well.

Considering we keep the logic to reserve initial WAL positions the
same as the current (Reserve WAL for the currently active local slot
using the specified WAL location (restart_lsn). If the given WAL
location has been removed, reserve WAL using the oldest existing WAL
segment.), I could think of the below scenarios:
A. For 1, we shouldn't sync the slot as it still wouldn't have been
marked persistent on the primary.
B. For 2, we would sync the slot
   B1. If remote_restart_lsn >= local_resart_lsn, then advance the
slot by calling pg_logical_replication_slot_advance().
       B11. If we reach consistent point, then it should be okay
because after promotion as well we should reach consistent point.
            B111. But again is it possible that there is some xact
that comes before consistent_point on primary and the same is after
consistent_point on standby? This shouldn't matter as we will start
decoding transactions after confirmed_flush_lsn which would be the
same on primary and standby.
       B22. If we haven't reached consistent_point, then we won't mark
the slot as persistent, and at the next sync we will do the same till
it reaches consistent_point. At that time, the situation will be
similar to B11.
   B2. If remote_restart_lsn < local_restart_lsn, then we will wait
for the next sync cycle and keep the slot as temporary. Once in the
next or some consecutive sync cycle, we reach the condition
remote_restart_lsn >= local_restart_lsn, we will proceed to advance
the slot and we should have the same behavior as B1.
C. For 3, we would sync the slot, but the behavior should be the same as B.

Thoughts?

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

28 March 2024, 11:35:35

On Thu, Mar 28, 2024 at 3:34 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Thu, Mar 28, 2024 at 04:38:19AM +0000, Zhijie Hou (Fujitsu) wrote:
>
> > To fix this, we could use the fast forward logical decoding to advance the synced
> > slot's lsn/xmin when syncing these values instead of directly updating the
> > slot's info. This way, the snapshot will be serialized to disk when decoding.
> > If we could not reach to the consistent point at the remote restart_lsn, the
> > slot is marked as temp and will be persisted once it reaches the consistent
> > point. I am still analyzing the fix and will share once ready.
>
> Thanks! I'm wondering about the performance impact (even in fast_forward mode),
> might be worth to keep an eye on it.
>

True, we can consider performance but correctness should be a
priority, and can we think of a better way to fix this issue?

> Should we create a 17 open item [1]?
>
> [1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items
>

Yes, we can do that.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

28 March 2024, 12:37:00

Hi,

On Thu, Mar 28, 2024 at 05:05:35PM +0530, Amit Kapila wrote:
> On Thu, Mar 28, 2024 at 3:34 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Thu, Mar 28, 2024 at 04:38:19AM +0000, Zhijie Hou (Fujitsu) wrote:
> >
> > > To fix this, we could use the fast forward logical decoding to advance the synced
> > > slot's lsn/xmin when syncing these values instead of directly updating the
> > > slot's info. This way, the snapshot will be serialized to disk when decoding.
> > > If we could not reach to the consistent point at the remote restart_lsn, the
> > > slot is marked as temp and will be persisted once it reaches the consistent
> > > point. I am still analyzing the fix and will share once ready.
> >
> > Thanks! I'm wondering about the performance impact (even in fast_forward mode),
> > might be worth to keep an eye on it.
> >
> 
> True, we can consider performance but correctness should be a
> priority,

Yeah of course.

> and can we think of a better way to fix this issue?

I'll keep you posted if there is one that I can think of.

> > Should we create a 17 open item [1]?
> >
> > [1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items
> >
> 
> Yes, we can do that.

done.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

28 March 2024, 14:01:31

On Thursday, March 28, 2024 7:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > When analyzing one BF error[1], we find an issue of slotsync: Since we
> > don't perform logical decoding for the synced slots when syncing the
> > lsn/xmin of slot, no logical snapshots will be serialized to disk. So,
> > when user starts to use these synced slots after promotion, it needs
> > to re-build the consistent snapshot from the restart_lsn if the
> > WAL(xl_running_xacts) at restart_lsn position indicates that there are
> > running transactions. This however could cause the data that before the
> consistent point to be missed[2].
> >
> > This issue doesn't exist on the primary because the snapshot at
> > restart_lsn should have been serialized to disk
> > (SnapBuildProcessRunningXacts -> SnapBuildSerialize), so even if the
> > logical decoding restarts, it can find consistent snapshot immediately at
> restart_lsn.
> >
> > To fix this, we could use the fast forward logical decoding to advance
> > the synced slot's lsn/xmin when syncing these values instead of
> > directly updating the slot's info. This way, the snapshot will be serialized to
> disk when decoding.
> > If we could not reach to the consistent point at the remote
> > restart_lsn, the slot is marked as temp and will be persisted once it
> > reaches the consistent point. I am still analyzing the fix and will share once
> ready.
> >
> 
> Yes, we can use this but one thing to note is that
> CreateDecodingContext() will expect that the slot's and current database are
> the same. I think the reason for that is we need to check system tables of the
> current database while decoding and sending data to the output_plugin which
> won't be a requirement for the fast_forward case. So, we need to skip that
> check in fast_forward mode.

Agreed.

> 
> Next, I was thinking about the case of the first time updating the restart and
> confirmed_flush LSN while syncing the slots. I think we can keep the current
> logic as it is based on the following analysis.
> 
> For each logical slot, cases possible on the primary:
> 1. The restart_lsn doesn't have a serialized snapshot and hasn't yet reached the
> consistent point.
> 2. The restart_lsn doesn't have a serialized snapshot but has reached a
> consistent point.
> 3. The restart_lsn has a serialized snapshot which means it has reached a
> consistent point as well.
> 
> Considering we keep the logic to reserve initial WAL positions the same as the
> current (Reserve WAL for the currently active local slot using the specified WAL
> location (restart_lsn). If the given WAL location has been removed, reserve
> WAL using the oldest existing WAL segment.), I could think of the below
> scenarios:
> A. For 1, we shouldn't sync the slot as it still wouldn't have been marked
> persistent on the primary.
> B. For 2, we would sync the slot
>    B1. If remote_restart_lsn >= local_resart_lsn, then advance the slot by calling
> pg_logical_replication_slot_advance().
>        B11. If we reach consistent point, then it should be okay because after
> promotion as well we should reach consistent point.
>             B111. But again is it possible that there is some xact that comes
> before consistent_point on primary and the same is after consistent_point on
> standby? This shouldn't matter as we will start decoding transactions after
> confirmed_flush_lsn which would be the same on primary and standby.
>        B22. If we haven't reached consistent_point, then we won't mark the slot
> as persistent, and at the next sync we will do the same till it reaches
> consistent_point. At that time, the situation will be similar to B11.
>    B2. If remote_restart_lsn < local_restart_lsn, then we will wait for the next
> sync cycle and keep the slot as temporary. Once in the next or some
> consecutive sync cycle, we reach the condition remote_restart_lsn >=
> local_restart_lsn, we will proceed to advance the slot and we should have the
> same behavior as B1.
> C. For 3, we would sync the slot, but the behavior should be the same as B.
> 
> Thoughts?

Looks reasonable to me.

Here is the patch based on above lines.
I am also testing and verifying the patch locally.

Best Regards,
Hou zj

Attachment

0001-advance-the-restart_lsn-of-synced-slots-using-logica.patch

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

29 March 2024, 01:06:15

On Thursday, March 28, 2024 10:02 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> On Thursday, March 28, 2024 7:32 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com>
> > wrote:
> > >
> > > When analyzing one BF error[1], we find an issue of slotsync: Since
> > > we don't perform logical decoding for the synced slots when syncing
> > > the lsn/xmin of slot, no logical snapshots will be serialized to
> > > disk. So, when user starts to use these synced slots after
> > > promotion, it needs to re-build the consistent snapshot from the
> > > restart_lsn if the
> > > WAL(xl_running_xacts) at restart_lsn position indicates that there
> > > are running transactions. This however could cause the data that
> > > before the
> > consistent point to be missed[2].
> > >
> > > This issue doesn't exist on the primary because the snapshot at
> > > restart_lsn should have been serialized to disk
> > > (SnapBuildProcessRunningXacts -> SnapBuildSerialize), so even if the
> > > logical decoding restarts, it can find consistent snapshot
> > > immediately at
> > restart_lsn.
> > >
> > > To fix this, we could use the fast forward logical decoding to
> > > advance the synced slot's lsn/xmin when syncing these values instead
> > > of directly updating the slot's info. This way, the snapshot will be
> > > serialized to
> > disk when decoding.
> > > If we could not reach to the consistent point at the remote
> > > restart_lsn, the slot is marked as temp and will be persisted once
> > > it reaches the consistent point. I am still analyzing the fix and
> > > will share once
> > ready.
> > >
> >
> > Yes, we can use this but one thing to note is that
> > CreateDecodingContext() will expect that the slot's and current
> > database are the same. I think the reason for that is we need to check
> > system tables of the current database while decoding and sending data
> > to the output_plugin which won't be a requirement for the fast_forward
> > case. So, we need to skip that check in fast_forward mode.
> 
> Agreed.
> 
> >
> > Next, I was thinking about the case of the first time updating the
> > restart and confirmed_flush LSN while syncing the slots. I think we
> > can keep the current logic as it is based on the following analysis.
> >
> > For each logical slot, cases possible on the primary:
> > 1. The restart_lsn doesn't have a serialized snapshot and hasn't yet
> > reached the consistent point.
> > 2. The restart_lsn doesn't have a serialized snapshot but has reached
> > a consistent point.
> > 3. The restart_lsn has a serialized snapshot which means it has
> > reached a consistent point as well.
> >
> > Considering we keep the logic to reserve initial WAL positions the
> > same as the current (Reserve WAL for the currently active local slot
> > using the specified WAL location (restart_lsn). If the given WAL
> > location has been removed, reserve WAL using the oldest existing WAL
> > segment.), I could think of the below
> > scenarios:
> > A. For 1, we shouldn't sync the slot as it still wouldn't have been
> > marked persistent on the primary.
> > B. For 2, we would sync the slot
> >    B1. If remote_restart_lsn >= local_resart_lsn, then advance the
> > slot by calling pg_logical_replication_slot_advance().
> >        B11. If we reach consistent point, then it should be okay
> > because after promotion as well we should reach consistent point.
> >             B111. But again is it possible that there is some xact
> > that comes before consistent_point on primary and the same is after
> > consistent_point on standby? This shouldn't matter as we will start
> > decoding transactions after confirmed_flush_lsn which would be the same on
> primary and standby.
> >        B22. If we haven't reached consistent_point, then we won't mark
> > the slot as persistent, and at the next sync we will do the same till
> > it reaches consistent_point. At that time, the situation will be similar to B11.
> >    B2. If remote_restart_lsn < local_restart_lsn, then we will wait
> > for the next sync cycle and keep the slot as temporary. Once in the
> > next or some consecutive sync cycle, we reach the condition
> > remote_restart_lsn >= local_restart_lsn, we will proceed to advance
> > the slot and we should have the same behavior as B1.
> > C. For 3, we would sync the slot, but the behavior should be the same as B.
> >
> > Thoughts?
> 
> Looks reasonable to me.
> 
> Here is the patch based on above lines.
> I am also testing and verifying the patch locally.

Attach a new version patch which fixed an un-initialized variable issue and
added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so that
we can analyze the possible CFbot failures easily.

Best Regards,
Hou zj

Attachment

v2-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

29 March 2024, 04:02:41

On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Attach a new version patch which fixed an un-initialized variable issue and
> added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so that
> we can analyze the possible CFbot failures easily.

As suggested by Amit in [1], for the fix being discussed where we need
to advance the synced slot on standby, we need to skip the dbid check
in fast_forward mode in CreateDecodingContext(). We tried few tests to
make sure that there was no table-access done during fast-forward mode

1) Initially we tried avoiding database-id check in
CreateDecodingContext() only when called by
pg_logical_replication_slot_advance(). 'make check-world' passed on
HEAD for the same.

2) But the more generic solution was to skip the database check if
"fast_forward" is true. It was tried and 'make check-world' passed on
HEAD for that as well.

3) Another thing tried by Hou-San was to run pgbench after skipping db
check in the fast_forward logical decoding case.
pgbench was run to generate some changes and then the logical slot was
advanced to the latest position in another database. A LOG was added
in relation_open to catch table access. It was found that there was no
table-access in fast forward logical decoding i.e. no LOGS for
table-open were generated during the test. Steps given at [2]

[1]: https://www.postgresql.org/message-id/CAA4eK1KMiKangJa4NH_K1oFc87Y01n3rnpuwYagT59Y%3DADW8Dw%40mail.gmail.com

[2]:
--------------
1. apply the DEBUG patch (attached as .txt) which will log the
relation open and table cache access.

2. create a slot:
SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot',
'test_decoding', false, false, true);

3. run pgbench to generate some data.
pgbench -i postgres
pgbench --aggregate-interval=5 --time=5 --client=10 --log --rate=1000
--latency-limit=10 --failures-detailed --max-tries=10 postgres

4. start a fresh session in a different db and advance the slot to the
latest position. There should be no relation open or CatCache log
between the LOG "starting logical decoding for slot .." and LOG
"decoding over".
SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn());
--------------

thanks
Shveta

Attachment

0001-debug-log.patch.txt

RE: Synchronizing slots from primary to standby

From

"Hayato Kuroda (Fujitsu)"

Date:

29 March 2024, 04:04:40

Dear Hou,

Thanks for updating the patch! Here is a comment for it.

```
+        /*
+         * By advancing the restart_lsn, confirmed_lsn, and xmin using
+         * fast-forward logical decoding, we can verify whether a consistent
+         * snapshot can be built. This process also involves saving necessary
+         * snapshots to disk during decoding, ensuring that logical decoding
+         * efficiently reaches a consistent point at the restart_lsn without
+         * the potential loss of data during snapshot creation.
+         */
+        pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
+                                            found_consistent_point);
+        ReplicationSlotsComputeRequiredLSN();
+        updated_lsn = true;
```

You added them like pg_replication_slot_advance(), but the function also calls
ReplicationSlotsComputeRequiredXmin(false) at that time. According to the related
commit b48df81 and discussions [1], I know it is needed only for physical slots,
but it makes more consistent to call requiredXmin() as well, per [2]:

```
This may be a waste if no advancing is done, but it could also be an
advantage to enforce a recalculation of the thresholds for each
function call.  And that's more consistent with the slot copy, drop
and creation.
```

How do you think?

[1]: https://www.postgresql.org/message-id/20200609171904.kpltxxvjzislidks%40alap3.anarazel.de
[2]: https://www.postgresql.org/message-id/20200616072727.GA2361%40paquier.xyz

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

29 March 2024, 06:48:01

Hi,

On Fri, Mar 29, 2024 at 01:06:15AM +0000, Zhijie Hou (Fujitsu) wrote:
> Attach a new version patch which fixed an un-initialized variable issue and
> added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so that
> we can analyze the possible CFbot failures easily.
> 

Thanks!

+       if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
+       {
+               /*
+                * By advancing the restart_lsn, confirmed_lsn, and xmin using
+                * fast-forward logical decoding, we ensure that the required snapshots
+                * are saved to disk. This enables logical decoding to quickly reach a
+                * consistent point at the restart_lsn, eliminating the risk of missing
+                * data during snapshot creation.
+                */
+               pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
+                                                                                       found_consistent_point);
+               ReplicationSlotsComputeRequiredLSN();
+               updated_lsn = true;
+       }

Instead of using pg_logical_replication_slot_advance() for each synced slot
and during sync cycles what about?:

- keep sync slot synchronization as it is currently (not using pg_logical_replication_slot_advance())
- create "an hidden" logical slot if sync slot feature is on
- at the time of promotion use pg_logical_replication_slot_advance() on this
hidden slot only to advance to the max lsn of the synced slots

I'm not sure that would be enough, just asking your thoughts on this (benefits
would be to avoid calling pg_logical_replication_slot_advance() on each sync slots
and during the sync cycles).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

29 March 2024, 06:50:21

On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
>
> Attach a new version patch which fixed an un-initialized variable issue and
> added some comments.
>

The other approach to fix this issue could be that the slotsync worker
get the serialized snapshot using pg_read_binary_file() corresponding
to restart_lsn and writes those at standby. But there are cases when
we won't have such a file like (a) when we initially create the slot
and reach the consistent_point, or (b) also by the time the slotsync
worker starts to read the remote snapshot file, the snapshot file
could have been removed by the checkpointer on the primary (if the
restart_lsn of the remote has been advanced in this window). So, in
such cases, we anyway need to advance the slot. I think these could be
optimizations that we could do in the future.

Few comments:
=============
1.
- if (slot->data.database != MyDatabaseId)
+ if (slot->data.database != MyDatabaseId && !fast_forward)
  ereport(ERROR,
  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
  errmsg("replication slot \"%s\" was not created in this database",
@@ -526,7 +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn,
  * Do not allow consumption of a "synchronized" slot until the standby
  * gets promoted.
  */
- if (RecoveryInProgress() && slot->data.synced)
+ if (RecoveryInProgress() && slot->data.synced && !IsSyncingReplicationSlots())

Add comments at both of the above places.

2.
+extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto,
+   bool *found_consistent_point);
+

This API looks a bit awkward as the functionality doesn't match the
name. How about having a function with name
LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
ready_for_decoding) with the same functionality as your patch has for
pg_logical_replication_slot_advance() and then invoke it both from
pg_logical_replication_slot_advance and slotsync.c. The function name
is too big, we can think of a shorter name. Any ideas?

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

29 March 2024, 07:23:11

On Friday, March 29, 2024 2:48 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Fri, Mar 29, 2024 at 01:06:15AM +0000, Zhijie Hou (Fujitsu) wrote:
> > Attach a new version patch which fixed an un-initialized variable
> > issue and added some comments. Also, temporarily enable DEBUG2 for the
> > 040 tap-test so that we can analyze the possible CFbot failures easily.
> >
>
> Thanks!
>
> +       if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
> +       {
> +               /*
> +                * By advancing the restart_lsn, confirmed_lsn, and xmin using
> +                * fast-forward logical decoding, we ensure that the required
> snapshots
> +                * are saved to disk. This enables logical decoding to quickly
> reach a
> +                * consistent point at the restart_lsn, eliminating the risk of
> missing
> +                * data during snapshot creation.
> +                */
> +
> pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
> +
> found_consistent_point);
> +               ReplicationSlotsComputeRequiredLSN();
> +               updated_lsn = true;
> +       }
>
> Instead of using pg_logical_replication_slot_advance() for each synced slot and
> during sync cycles what about?:
>
> - keep sync slot synchronization as it is currently (not using
> pg_logical_replication_slot_advance())
> - create "an hidden" logical slot if sync slot feature is on
> - at the time of promotion use pg_logical_replication_slot_advance() on this
> hidden slot only to advance to the max lsn of the synced slots
>
> I'm not sure that would be enough, just asking your thoughts on this (benefits
> would be to avoid calling pg_logical_replication_slot_advance() on each sync
> slots and during the sync cycles).

Thanks for the idea !

I considered about this. I think advancing the "hidden" slot on promotion may be a
bit late, because if we cannot reach the consistent point after advancing the
"hidden" slot, then it means we may need to remove all the synced slots as we
are not sure if they are usable(will not loss data) after promotion. And it may
confuse user a bit as they have seen these slots to be sync-ready.

The current approach is to mark such un-consistent slot as temp and persist
them once it reaches consistent point, so that user can ensure the slot can be
used after promotion once persisted.

Another optimization idea is to check the snapshot file existence before calling the
slot_advance(). If the file already exists, we skip the decoding and directly
update the restart_lsn. This way, we could also avoid some duplicate decoding
work.

Best Regards,
Hou zj

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

29 March 2024, 07:38:14

Hi,

On Fri, Mar 29, 2024 at 07:23:11AM +0000, Zhijie Hou (Fujitsu) wrote:
> On Friday, March 29, 2024 2:48 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> > 
> > Hi,
> > 
> > On Fri, Mar 29, 2024 at 01:06:15AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > Attach a new version patch which fixed an un-initialized variable
> > > issue and added some comments. Also, temporarily enable DEBUG2 for the
> > > 040 tap-test so that we can analyze the possible CFbot failures easily.
> > >
> > 
> > Thanks!
> > 
> > +       if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
> > +       {
> > +               /*
> > +                * By advancing the restart_lsn, confirmed_lsn, and xmin using
> > +                * fast-forward logical decoding, we ensure that the required
> > snapshots
> > +                * are saved to disk. This enables logical decoding to quickly
> > reach a
> > +                * consistent point at the restart_lsn, eliminating the risk of
> > missing
> > +                * data during snapshot creation.
> > +                */
> > +
> > pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
> > +
> > found_consistent_point);
> > +               ReplicationSlotsComputeRequiredLSN();
> > +               updated_lsn = true;
> > +       }
> > 
> > Instead of using pg_logical_replication_slot_advance() for each synced slot and
> > during sync cycles what about?:
> > 
> > - keep sync slot synchronization as it is currently (not using
> > pg_logical_replication_slot_advance())
> > - create "an hidden" logical slot if sync slot feature is on
> > - at the time of promotion use pg_logical_replication_slot_advance() on this
> > hidden slot only to advance to the max lsn of the synced slots
> > 
> > I'm not sure that would be enough, just asking your thoughts on this (benefits
> > would be to avoid calling pg_logical_replication_slot_advance() on each sync
> > slots and during the sync cycles).
> 
> Thanks for the idea !
> 
> I considered about this. I think advancing the "hidden" slot on promotion may be a
> bit late, because if we cannot reach the consistent point after advancing the
> "hidden" slot, then it means we may need to remove all the synced slots as we
> are not sure if they are usable(will not loss data) after promotion.

What about advancing the hidden slot during the sync cycles then?

> The current approach is to mark such un-consistent slot as temp and persist
> them once it reaches consistent point, so that user can ensure the slot can be
> used after promotion once persisted.

Right, but do we need to do so for all the sync slots? Would a single hidden
slot be enough?

> Another optimization idea is to check the snapshot file existence before calling the
> slot_advance(). If the file already exists, we skip the decoding and directly
> update the restart_lsn. This way, we could also avoid some duplicate decoding
> work.

Yeah, I think it's a good idea (even better if we can do this check without
performing any I/O).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

29 March 2024, 08:58:10

On Fri, Mar 29, 2024 at 9:34 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Thanks for updating the patch! Here is a comment for it.
>
> ```
> +        /*
> +         * By advancing the restart_lsn, confirmed_lsn, and xmin using
> +         * fast-forward logical decoding, we can verify whether a consistent
> +         * snapshot can be built. This process also involves saving necessary
> +         * snapshots to disk during decoding, ensuring that logical decoding
> +         * efficiently reaches a consistent point at the restart_lsn without
> +         * the potential loss of data during snapshot creation.
> +         */
> +        pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
> +                                            found_consistent_point);
> +        ReplicationSlotsComputeRequiredLSN();
> +        updated_lsn = true;
> ```
>
> You added them like pg_replication_slot_advance(), but the function also calls
> ReplicationSlotsComputeRequiredXmin(false) at that time. According to the related
> commit b48df81 and discussions [1], I know it is needed only for physical slots,
> but it makes more consistent to call requiredXmin() as well, per [2]:
>

Yeah, I also think it is okay to call for the sake of consistency with
pg_replication_slot_advance().

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

29 March 2024, 09:05:22

On Fri, Mar 29, 2024 at 1:08 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Fri, Mar 29, 2024 at 07:23:11AM +0000, Zhijie Hou (Fujitsu) wrote:
> > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Fri, Mar 29, 2024 at 01:06:15AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > > Attach a new version patch which fixed an un-initialized variable
> > > > issue and added some comments. Also, temporarily enable DEBUG2 for the
> > > > 040 tap-test so that we can analyze the possible CFbot failures easily.
> > > >
> > >
> > > Thanks!
> > >
> > > +       if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
> > > +       {
> > > +               /*
> > > +                * By advancing the restart_lsn, confirmed_lsn, and xmin using
> > > +                * fast-forward logical decoding, we ensure that the required
> > > snapshots
> > > +                * are saved to disk. This enables logical decoding to quickly
> > > reach a
> > > +                * consistent point at the restart_lsn, eliminating the risk of
> > > missing
> > > +                * data during snapshot creation.
> > > +                */
> > > +
> > > pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
> > > +
> > > found_consistent_point);
> > > +               ReplicationSlotsComputeRequiredLSN();
> > > +               updated_lsn = true;
> > > +       }
> > >
> > > Instead of using pg_logical_replication_slot_advance() for each synced slot and
> > > during sync cycles what about?:
> > >
> > > - keep sync slot synchronization as it is currently (not using
> > > pg_logical_replication_slot_advance())
> > > - create "an hidden" logical slot if sync slot feature is on
> > > - at the time of promotion use pg_logical_replication_slot_advance() on this
> > > hidden slot only to advance to the max lsn of the synced slots
> > >
> > > I'm not sure that would be enough, just asking your thoughts on this (benefits
> > > would be to avoid calling pg_logical_replication_slot_advance() on each sync
> > > slots and during the sync cycles).
> >
> > Thanks for the idea !
> >
> > I considered about this. I think advancing the "hidden" slot on promotion may be a
> > bit late, because if we cannot reach the consistent point after advancing the
> > "hidden" slot, then it means we may need to remove all the synced slots as we
> > are not sure if they are usable(will not loss data) after promotion.
>
> What about advancing the hidden slot during the sync cycles then?
>
> > The current approach is to mark such un-consistent slot as temp and persist
> > them once it reaches consistent point, so that user can ensure the slot can be
> > used after promotion once persisted.
>
> Right, but do we need to do so for all the sync slots? Would a single hidden
> slot be enough?
>

Even if we mark one of the synced slots as persistent without reaching
a consistent state, it could create a problem after promotion. And,
how a single hidden slot would serve the purpose, different synced
slots will have different restart/confirmed_flush LSN and we won't be
able to perform advancing for those using a single slot. For example,
say for first synced slot, it has not reached a consistent state and
then how can it try for the second slot? This sounds quite tricky to
make work. We should go with something simple where the chances of
introducing bugs are lesser.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

29 March 2024, 12:21:32

Hi,

On Fri, Mar 29, 2024 at 02:35:22PM +0530, Amit Kapila wrote:
> On Fri, Mar 29, 2024 at 1:08 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Fri, Mar 29, 2024 at 07:23:11AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Fri, Mar 29, 2024 at 01:06:15AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > > > Attach a new version patch which fixed an un-initialized variable
> > > > > issue and added some comments. Also, temporarily enable DEBUG2 for the
> > > > > 040 tap-test so that we can analyze the possible CFbot failures easily.
> > > > >
> > > >
> > > > Thanks!
> > > >
> > > > +       if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
> > > > +       {
> > > > +               /*
> > > > +                * By advancing the restart_lsn, confirmed_lsn, and xmin using
> > > > +                * fast-forward logical decoding, we ensure that the required
> > > > snapshots
> > > > +                * are saved to disk. This enables logical decoding to quickly
> > > > reach a
> > > > +                * consistent point at the restart_lsn, eliminating the risk of
> > > > missing
> > > > +                * data during snapshot creation.
> > > > +                */
> > > > +
> > > > pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
> > > > +
> > > > found_consistent_point);
> > > > +               ReplicationSlotsComputeRequiredLSN();
> > > > +               updated_lsn = true;
> > > > +       }
> > > >
> > > > Instead of using pg_logical_replication_slot_advance() for each synced slot and
> > > > during sync cycles what about?:
> > > >
> > > > - keep sync slot synchronization as it is currently (not using
> > > > pg_logical_replication_slot_advance())
> > > > - create "an hidden" logical slot if sync slot feature is on
> > > > - at the time of promotion use pg_logical_replication_slot_advance() on this
> > > > hidden slot only to advance to the max lsn of the synced slots
> > > >
> > > > I'm not sure that would be enough, just asking your thoughts on this (benefits
> > > > would be to avoid calling pg_logical_replication_slot_advance() on each sync
> > > > slots and during the sync cycles).
> > >
> > > Thanks for the idea !
> > >
> > > I considered about this. I think advancing the "hidden" slot on promotion may be a
> > > bit late, because if we cannot reach the consistent point after advancing the
> > > "hidden" slot, then it means we may need to remove all the synced slots as we
> > > are not sure if they are usable(will not loss data) after promotion.
> >
> > What about advancing the hidden slot during the sync cycles then?
> >
> > > The current approach is to mark such un-consistent slot as temp and persist
> > > them once it reaches consistent point, so that user can ensure the slot can be
> > > used after promotion once persisted.
> >
> > Right, but do we need to do so for all the sync slots? Would a single hidden
> > slot be enough?
> >
> 
> Even if we mark one of the synced slots as persistent without reaching
> a consistent state, it could create a problem after promotion. And,
> how a single hidden slot would serve the purpose, different synced
> slots will have different restart/confirmed_flush LSN and we won't be
> able to perform advancing for those using a single slot. For example,
> say for first synced slot, it has not reached a consistent state and
> then how can it try for the second slot? This sounds quite tricky to
> make work. We should go with something simple where the chances of
> introducing bugs are lesser.

Yeah, better to go with something simple.

+       if (remote_slot->confirmed_lsn != slot->data.confirmed_flush)
+       {
+               /*
+                * By advancing the restart_lsn, confirmed_lsn, and xmin using
+                * fast-forward logical decoding, we ensure that the required snapshots
+                * are saved to disk. This enables logical decoding to quickly reach a
+                * consistent point at the restart_lsn, eliminating the risk of missing
+                * data during snapshot creation.
+                */
+               pg_logical_replication_slot_advance(remote_slot->confirmed_lsn,
+                                                                                       found_consistent_point);

In our case, what about skipping WaitForStandbyConfirmation() in
pg_logical_replication_slot_advance()? (It could go until the
RecoveryInProgress() check in StandbySlotsHaveCaughtup() if we don't skip it).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

01 April 2024, 00:56:29

On Friday, March 29, 2024 2:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> >
> > Attach a new version patch which fixed an un-initialized variable
> > issue and added some comments.
> >
> 
> The other approach to fix this issue could be that the slotsync worker get the
> serialized snapshot using pg_read_binary_file() corresponding to restart_lsn
> and writes those at standby. But there are cases when we won't have such a file
> like (a) when we initially create the slot and reach the consistent_point, or (b)
> also by the time the slotsync worker starts to read the remote snapshot file, the
> snapshot file could have been removed by the checkpointer on the primary (if
> the restart_lsn of the remote has been advanced in this window). So, in such
> cases, we anyway need to advance the slot. I think these could be optimizations
> that we could do in the future.
> 
> Few comments:

Thanks for the comments.

> =============
> 1.
> - if (slot->data.database != MyDatabaseId)
> + if (slot->data.database != MyDatabaseId && !fast_forward)
>   ereport(ERROR,
>   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>   errmsg("replication slot \"%s\" was not created in this database", @@ -526,7
> +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn,
>   * Do not allow consumption of a "synchronized" slot until the standby
>   * gets promoted.
>   */
> - if (RecoveryInProgress() && slot->data.synced)
> + if (RecoveryInProgress() && slot->data.synced &&
> + !IsSyncingReplicationSlots())
> 
> 
> Add comments at both of the above places.

Added.

> 
> 
> 2.
> +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto,
> +   bool *found_consistent_point);
> +
> 
> This API looks a bit awkward as the functionality doesn't match the name. How
> about having a function with name
> LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> ready_for_decoding) with the same functionality as your patch has for
> pg_logical_replication_slot_advance() and then invoke it both from
> pg_logical_replication_slot_advance and slotsync.c. The function name is too
> big, we can think of a shorter name. Any ideas?

How about LogicalSlotAdvanceAndCheckDecodingState() Or just
LogicalSlotAdvanceAndCheckDecoding()? (I used the suggested
LogicalSlotAdvanceAndCheckReadynessForDecoding in this version, It can be renamed in
next version if we agree).

Attach the V3 patch which addressed above comments and Kuroda-san's
comments[1]. I also adjusted the tap-test to only check the confirmed_flush_lsn
after syncing, as the restart_lsn could be different from the remote one due to
the new slot_advance() call. I am also testing some optimization idea locally
and will share if ready.

[1]
https://www.postgresql.org/message-id/TYCPR01MB1207757BB2A32B6815CE1CCE7F53A2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hou zj

Attachment

v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch

Re: Synchronizing slots from primary to standby

From

Bharath Rupireddy

Date:

01 April 2024, 04:31:19

On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> [2] The steps to reproduce the data miss issue on a primary->standby setup:

I'm trying to reproduce the problem with [1], but I can see the
changes after the standby is promoted. Am I missing anything here?

ubuntu:~/postgres/pg17/bin$ ./psql -d postgres -p 5433 -c "select *
from pg_logical_slot_get_changes('lrep_sync_slot', NULL, NULL);"
    lsn    | xid |                    data
-----------+-----+---------------------------------------------
 0/30000B0 | 738 | BEGIN 738
 0/3017FC8 | 738 | COMMIT 738
 0/3017FF8 | 739 | BEGIN 739
 0/3019A38 | 739 | COMMIT 739
 0/3019A38 | 740 | BEGIN 740
 0/3019A38 | 740 | table public.dummy1: INSERT: a[integer]:999
 0/3019AA8 | 740 | COMMIT 740
(7 rows)

[1]
-#define LOG_SNAPSHOT_INTERVAL_MS 15000
+#define LOG_SNAPSHOT_INTERVAL_MS 1500000

./initdb -D db17
echo "archive_mode = on
archive_command='cp %p /home/ubuntu/postgres/pg17/bin/archived_wal/%f'
wal_level='logical'
autovacuum = off
checkpoint_timeout='1h'" | tee -a db17/postgresql.conf

./pg_ctl -D db17 -l logfile17 start

rm -rf sbdata logfilesbdata
./pg_basebackup -D sbdata

./psql -d postgres -p 5432 -c "SELECT
pg_create_logical_replication_slot('lrep_sync_slot', 'test_decoding',
false, false, true);"
./psql -d postgres -p 5432 -c "SELECT
pg_create_physical_replication_slot('phy_repl_slot', true, false);"

echo "port=5433
primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu'
primary_slot_name='phy_repl_slot'
restore_command='cp /home/ubuntu/postgres/pg17/bin/archived_wal/%f %p'
hot_standby_feedback=on
sync_replication_slots=on" | tee -a sbdata/postgresql.conf

touch sbdata/standby.signal

./pg_ctl -D sbdata -l logfilesbdata start
./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();"

./psql -d postgres

SESSION1, TXN1
BEGIN;
create table dummy1(a int);

SESSION2, TXN2
SELECT pg_log_standby_snapshot();

SESSION1, TXN1
COMMIT;

SESSION1, TXN1
BEGIN;
create table dummy2(a int);

SESSION2, TXN2
SELECT pg_log_standby_snapshot();

SESSION1, TXN1
COMMIT;

./psql -d postgres -p 5432 -c "SELECT
pg_replication_slot_advance('lrep_sync_slot', pg_current_wal_lsn());"
./psql -d postgres -p 5432 -c "INSERT INTO dummy1 VALUES(999);"

./psql -d postgres -p 5433 -c "SELECT pg_promote();"
./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();"

./psql -d postgres -p 5433 -c "select * from
pg_logical_slot_get_changes('lrep_sync_slot', NULL, NULL);"

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

01 April 2024, 05:09:53

On Mon, Apr 1, 2024 at 10:01 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > [2] The steps to reproduce the data miss issue on a primary->standby setup:
>
> I'm trying to reproduce the problem with [1], but I can see the
> changes after the standby is promoted. Am I missing anything here?
>
> ubuntu:~/postgres/pg17/bin$ ./psql -d postgres -p 5433 -c "select *
> from pg_logical_slot_get_changes('lrep_sync_slot', NULL, NULL);"
>     lsn    | xid |                    data
> -----------+-----+---------------------------------------------
>  0/30000B0 | 738 | BEGIN 738
>  0/3017FC8 | 738 | COMMIT 738
>  0/3017FF8 | 739 | BEGIN 739
>  0/3019A38 | 739 | COMMIT 739
>  0/3019A38 | 740 | BEGIN 740
>  0/3019A38 | 740 | table public.dummy1: INSERT: a[integer]:999
>  0/3019AA8 | 740 | COMMIT 740
> (7 rows)
>
> [1]
> -#define LOG_SNAPSHOT_INTERVAL_MS 15000
> +#define LOG_SNAPSHOT_INTERVAL_MS 1500000
>
> ./initdb -D db17
> echo "archive_mode = on
> archive_command='cp %p /home/ubuntu/postgres/pg17/bin/archived_wal/%f'
> wal_level='logical'
> autovacuum = off
> checkpoint_timeout='1h'" | tee -a db17/postgresql.conf
>
> ./pg_ctl -D db17 -l logfile17 start
>
> rm -rf sbdata logfilesbdata
> ./pg_basebackup -D sbdata
>
> ./psql -d postgres -p 5432 -c "SELECT
> pg_create_logical_replication_slot('lrep_sync_slot', 'test_decoding',
> false, false, true);"
> ./psql -d postgres -p 5432 -c "SELECT
> pg_create_physical_replication_slot('phy_repl_slot', true, false);"
>
> echo "port=5433
> primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu'
> primary_slot_name='phy_repl_slot'
> restore_command='cp /home/ubuntu/postgres/pg17/bin/archived_wal/%f %p'
> hot_standby_feedback=on
> sync_replication_slots=on" | tee -a sbdata/postgresql.conf
>
> touch sbdata/standby.signal
>
> ./pg_ctl -D sbdata -l logfilesbdata start
> ./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();"
>
> ./psql -d postgres
>
> SESSION1, TXN1
> BEGIN;
> create table dummy1(a int);
>
> SESSION2, TXN2
> SELECT pg_log_standby_snapshot();
>
> SESSION1, TXN1
> COMMIT;
>
> SESSION1, TXN1
> BEGIN;
> create table dummy2(a int);
>
> SESSION2, TXN2
> SELECT pg_log_standby_snapshot();
>
> SESSION1, TXN1
> COMMIT;
>
> ./psql -d postgres -p 5432 -c "SELECT
> pg_replication_slot_advance('lrep_sync_slot', pg_current_wal_lsn());"
>

After this step and before the next, did you ensure that the slot sync
has synced the latest confirmed_flush/restart LSNs? You can query:
"select slot_name,restart_lsn, confirmed_flush_lsn from
pg_replication_slots;" to ensure the same on both the primary and
standby.

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

01 April 2024, 06:05:34

On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> On Friday, March 29, 2024 2:50 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com>
> > wrote:
> > >
> > >
> > > Attach a new version patch which fixed an un-initialized variable
> > > issue and added some comments.
> > >
> >
> > The other approach to fix this issue could be that the slotsync worker
> > get the serialized snapshot using pg_read_binary_file() corresponding
> > to restart_lsn and writes those at standby. But there are cases when
> > we won't have such a file like (a) when we initially create the slot
> > and reach the consistent_point, or (b) also by the time the slotsync
> > worker starts to read the remote snapshot file, the snapshot file
> > could have been removed by the checkpointer on the primary (if the
> > restart_lsn of the remote has been advanced in this window). So, in
> > such cases, we anyway need to advance the slot. I think these could be
> optimizations that we could do in the future.
> >
> > Few comments:
> 
> Thanks for the comments.
> 
> > =============
> > 1.
> > - if (slot->data.database != MyDatabaseId)
> > + if (slot->data.database != MyDatabaseId && !fast_forward)
> >   ereport(ERROR,
> >   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >   errmsg("replication slot \"%s\" was not created in this database",
> > @@ -526,7
> > +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn,
> >   * Do not allow consumption of a "synchronized" slot until the standby
> >   * gets promoted.
> >   */
> > - if (RecoveryInProgress() && slot->data.synced)
> > + if (RecoveryInProgress() && slot->data.synced &&
> > + !IsSyncingReplicationSlots())
> >
> >
> > Add comments at both of the above places.
> 
> Added.
> 
> >
> >
> > 2.
> > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr
> moveto,
> > +   bool *found_consistent_point);
> > +
> >
> > This API looks a bit awkward as the functionality doesn't match the
> > name. How about having a function with name
> > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > ready_for_decoding) with the same functionality as your patch has for
> > pg_logical_replication_slot_advance() and then invoke it both from
> > pg_logical_replication_slot_advance and slotsync.c. The function name
> > is too big, we can think of a shorter name. Any ideas?
> 
> How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> LogicalSlotAdvanceAndCheckDecoding()? (I used the suggested
> LogicalSlotAdvanceAndCheckReadynessForDecoding in this version, It can be
> renamed in next version if we agree).
> 
> Attach the V3 patch which addressed above comments and Kuroda-san's
> comments[1]. I also adjusted the tap-test to only check the confirmed_flush_lsn
> after syncing, as the restart_lsn could be different from the remote one due to
> the new slot_advance() call. I am also testing some optimization idea locally and
> will share if ready.

Attach the V4 patch which includes the optimization to skip the decoding if
the snapshot at the syncing restart_lsn is already serialized. It can avoid most
of the duplicate decoding in my test, and I am doing some more tests locally.

Best Regards,
Hou zj

Attachment

v4-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

01 April 2024, 06:51:19

On Mon, Apr 1, 2024 at 10:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Apr 1, 2024 at 10:01 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > [2] The steps to reproduce the data miss issue on a primary->standby setup:
> >
> > I'm trying to reproduce the problem with [1], but I can see the
> > changes after the standby is promoted. Am I missing anything here?
> >
> > ubuntu:~/postgres/pg17/bin$ ./psql -d postgres -p 5433 -c "select *
> > from pg_logical_slot_get_changes('lrep_sync_slot', NULL, NULL);"
> >     lsn    | xid |                    data
> > -----------+-----+---------------------------------------------
> >  0/30000B0 | 738 | BEGIN 738
> >  0/3017FC8 | 738 | COMMIT 738
> >  0/3017FF8 | 739 | BEGIN 739
> >  0/3019A38 | 739 | COMMIT 739
> >  0/3019A38 | 740 | BEGIN 740
> >  0/3019A38 | 740 | table public.dummy1: INSERT: a[integer]:999
> >  0/3019AA8 | 740 | COMMIT 740
> > (7 rows)
> >
> > [1]
> > -#define LOG_SNAPSHOT_INTERVAL_MS 15000
> > +#define LOG_SNAPSHOT_INTERVAL_MS 1500000
> >
> > ./initdb -D db17
> > echo "archive_mode = on
> > archive_command='cp %p /home/ubuntu/postgres/pg17/bin/archived_wal/%f'
> > wal_level='logical'
> > autovacuum = off
> > checkpoint_timeout='1h'" | tee -a db17/postgresql.conf
> >
> > ./pg_ctl -D db17 -l logfile17 start
> >
> > rm -rf sbdata logfilesbdata
> > ./pg_basebackup -D sbdata
> >
> > ./psql -d postgres -p 5432 -c "SELECT
> > pg_create_logical_replication_slot('lrep_sync_slot', 'test_decoding',
> > false, false, true);"
> > ./psql -d postgres -p 5432 -c "SELECT
> > pg_create_physical_replication_slot('phy_repl_slot', true, false);"
> >
> > echo "port=5433
> > primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu'
> > primary_slot_name='phy_repl_slot'
> > restore_command='cp /home/ubuntu/postgres/pg17/bin/archived_wal/%f %p'
> > hot_standby_feedback=on
> > sync_replication_slots=on" | tee -a sbdata/postgresql.conf
> >
> > touch sbdata/standby.signal
> >
> > ./pg_ctl -D sbdata -l logfilesbdata start
> > ./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();"
> >
> > ./psql -d postgres
> >
> > SESSION1, TXN1
> > BEGIN;
> > create table dummy1(a int);
> >
> > SESSION2, TXN2
> > SELECT pg_log_standby_snapshot();
> >
> > SESSION1, TXN1
> > COMMIT;
> >
> > SESSION1, TXN1
> > BEGIN;
> > create table dummy2(a int);
> >
> > SESSION2, TXN2
> > SELECT pg_log_standby_snapshot();
> >
> > SESSION1, TXN1
> > COMMIT;
> >
> > ./psql -d postgres -p 5432 -c "SELECT
> > pg_replication_slot_advance('lrep_sync_slot', pg_current_wal_lsn());"
> >
>
> After this step and before the next, did you ensure that the slot sync
> has synced the latest confirmed_flush/restart LSNs? You can query:
> "select slot_name,restart_lsn, confirmed_flush_lsn from
> pg_replication_slots;" to ensure the same on both the primary and
> standby.

+1. To ensure last sync, one can run this manually on standby just
before promotion :
SELECT pg_sync_replication_slots();

thanks

Shveta

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

01 April 2024, 09:21:35

Hi,

On Mon, Apr 01, 2024 at 06:05:34AM +0000, Zhijie Hou (Fujitsu) wrote:
> On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> Attach the V4 patch which includes the optimization to skip the decoding if
> the snapshot at the syncing restart_lsn is already serialized. It can avoid most
> of the duplicate decoding in my test, and I am doing some more tests locally.
> 

Thanks!

1 ===

Same comment as in [1].

In LogicalSlotAdvanceAndCheckReadynessForDecoding(), if we are synchronizing slots
then I think that we can skip:

+               /*
+                * Wait for specified streaming replication standby servers (if any)
+                * to confirm receipt of WAL up to moveto lsn.
+                */
+               WaitForStandbyConfirmation(moveto);

Indeed if we are dealing with synced slot then we know we're in RecoveryInProgress().

Then there is no need to call WaitForStandbyConfirmation() as it could go until
the RecoveryInProgress() in StandbySlotsHaveCaughtup() for nothing (as we already
know it).

2 ===

+       {
+               if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
+               {

That could call SnapBuildSnapshotExists() multiple times for the same
"restart_lsn" (for example in case of multiple remote slots to sync).

What if the sync worker records the last lsn it asks for serialization (and
serialized ? Then we could check that value first before deciding to call (or not)
SnapBuildSnapshotExists() on it?

It's not ideal because it would record "only the last one" but that would be
simple enough for now (currently there is only one sync worker so that scenario
is likely to happen).

Maybe an idea for future improvement (not for now) could be that
SnapBuildSerialize() maintains a "small list" of "already serialized" snapshots.

[1]: https://www.postgresql.org/message-id/ZgayTFIhLfzhpHci%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Nisha Moond

Date:

01 April 2024, 10:41:53

Did performance test on optimization patch
(v2-0001-optimize-the-slot-advancement.patch). Please find the
results:

Setup:
- One primary node with 100 failover-enabled logical slots
    - 20 DBs, each having 5 failover-enabled logical replication slots
- One physical standby node with 'sync_replication_slots' as off but
other parameters required by slot-sync as enabled.

Node Configurations: please see config.txt

Test Plan:
1) Create 20 Databases on Primary node, each with 5 failover slots
using "pg_create_logical_replication_slot()". Overall 100 failover
slots.
2) Use pg_sync_replication_slot() to sync them to the standby. Note
the execution time of sync and lsns values.
3) On Primary node, run pgbench for 15 mins on postgres db
4) Advance lsns of all the 100 slots on primary using
pg_replication_slot_advance().
5) Use pg_sync_replication_slot() to sync slots to the standby. Note
the execution time of sync and lsns values.

Executed the above test plan for three cases and did time elapsed
comparison for the pg_replication_slot_advance()-

(1) HEAD
Time taken by pg_sync_replication_slot() on Standby node -
  a) The initial sync (step 2) = 140.208 ms
  b) Sync after pgbench run on primary (step 5) = 66.994 ms

(2) HEAD + v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
  a) The initial sync (step 2) = 163.885 ms
  b) Sync after pgbench run on primary (step 5) = 837901.290 ms (13:57.901)

  >> With v3 patch, the pg_sync_replication_slot() takes a significant
amount of time to sync the slots.

(3) HEAD + v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
+ v2-0001-optimize-the-slot-advancement.patch
  a) The initial sync (step 2) = 165.554 ms
  b) Sync after pgbench run on primary (step 5) = 7991.718 ms (00:07.992)

  >> With the optimization patch, the time taken by
pg_sync_replication_slot() is reduced significantly to ~7 seconds.

We did the same test with a single DB too by creating all 100 failover
slots in postgres DB and the results were almost similar.

Attached the scripts used for the test  -
"v3_perf_test_scripts.tar.gz" include files -
setup_multidb.sh : setup primary and standby nodes
createdb20.sql : create 20 DBs
createslot20.sql : create total 100 logical slots, 5 on each DB
run_sync.sql : call pg_replication_slot_advance() with timing
advance20.sql : advance lsn of all slots on Primary node to current lsn
advance20_perdb.sql : use on HEAD to advance lsn on Primary node
get_synced_data.sql : get details of the
config.txt : configuration used for nodes

Attachment

v3_perf_test_scripts.tar.gz

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

01 April 2024, 11:30:03

On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Friday, March 29, 2024 2:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> >
> >
> > 2.
> > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto,
> > +   bool *found_consistent_point);
> > +
> >
> > This API looks a bit awkward as the functionality doesn't match the name. How
> > about having a function with name
> > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > ready_for_decoding) with the same functionality as your patch has for
> > pg_logical_replication_slot_advance() and then invoke it both from
> > pg_logical_replication_slot_advance and slotsync.c. The function name is too
> > big, we can think of a shorter name. Any ideas?
>
> How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> LogicalSlotAdvanceAndCheckDecoding()?
>

It is about snapbuild state, so how about naming the function as
LogicalSlotAdvanceAndCheckSnapState()?

I have made quite a few cosmetic changes in comments and code. See
attached. This is atop your latest patch. Can you please review and
include these changes in the next version?

--
With Regards,
Amit Kapila.

Attachment

v4_amit.1.patch.txt

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

01 April 2024, 11:34:53

On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Mon, Apr 01, 2024 at 06:05:34AM +0000, Zhijie Hou (Fujitsu) wrote:
> > On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> > Attach the V4 patch which includes the optimization to skip the decoding if
> > the snapshot at the syncing restart_lsn is already serialized. It can avoid most
> > of the duplicate decoding in my test, and I am doing some more tests locally.
> >
>
> Thanks!
>
> 1 ===
>
> Same comment as in [1].
>
> In LogicalSlotAdvanceAndCheckReadynessForDecoding(), if we are synchronizing slots
> then I think that we can skip:
>
> +               /*
> +                * Wait for specified streaming replication standby servers (if any)
> +                * to confirm receipt of WAL up to moveto lsn.
> +                */
> +               WaitForStandbyConfirmation(moveto);
>
> Indeed if we are dealing with synced slot then we know we're in RecoveryInProgress().
>
> Then there is no need to call WaitForStandbyConfirmation() as it could go until
> the RecoveryInProgress() in StandbySlotsHaveCaughtup() for nothing (as we already
> know it).
>

Won't it will normally return from the first check in
WaitForStandbyConfirmation() because standby_slot_names_config is not
set on standby?

> 2 ===
>
> +       {
> +               if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
> +               {
>
> That could call SnapBuildSnapshotExists() multiple times for the same
> "restart_lsn" (for example in case of multiple remote slots to sync).
>
> What if the sync worker records the last lsn it asks for serialization (and
> serialized ? Then we could check that value first before deciding to call (or not)
> SnapBuildSnapshotExists() on it?
>
> It's not ideal because it would record "only the last one" but that would be
> simple enough for now (currently there is only one sync worker so that scenario
> is likely to happen).
>

Yeah, we could do that but I am not sure how much it can help. I guess
we could do some tests to see if it helps.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Bharath Rupireddy

Date:

01 April 2024, 12:11:43

On Mon, Apr 1, 2024 at 10:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> After this step and before the next, did you ensure that the slot sync
> has synced the latest confirmed_flush/restart LSNs? You can query:
> "select slot_name,restart_lsn, confirmed_flush_lsn from
> pg_replication_slots;" to ensure the same on both the primary and
> standby.

Yes, after ensuring the slot is synced on the standby, the problem is
reproduced for me and the proposed patch fixes it (i.e. able to see
the changes even after the promotion). I'm just thinking if we can add
a TAP test for this issue, but one key aspect of this reproducer is to
not let someone write a RUNNING_XACTS WAL record on the primary in
between before the standby promotion. Setting bgwriter_delay to max
isn't helping me. I think we can think of using an injection point to
add delay in LogStandbySnapshot() for having this problem reproduced
consistently in a TAP test. Perhaps, we can think of adding this later
after the fix is shipped.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

01 April 2024, 13:27:56

Hi,

On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote:
> On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > Then there is no need to call WaitForStandbyConfirmation() as it could go until
> > the RecoveryInProgress() in StandbySlotsHaveCaughtup() for nothing (as we already
> > know it).
> >
> 
> Won't it will normally return from the first check in
> WaitForStandbyConfirmation() because standby_slot_names_config is not
> set on standby?

I think standby_slot_names can be set on a standby. One could want to set it in
a cascading standby env (though it won't have any real effects until the standby
is promoted). 

> 
> > 2 ===
> >
> > +       {
> > +               if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
> > +               {
> >
> > That could call SnapBuildSnapshotExists() multiple times for the same
> > "restart_lsn" (for example in case of multiple remote slots to sync).
> >
> > What if the sync worker records the last lsn it asks for serialization (and
> > serialized ? Then we could check that value first before deciding to call (or not)
> > SnapBuildSnapshotExists() on it?
> >
> > It's not ideal because it would record "only the last one" but that would be
> > simple enough for now (currently there is only one sync worker so that scenario
> > is likely to happen).
> >
> 
> Yeah, we could do that but I am not sure how much it can help. I guess
> we could do some tests to see if it helps.

Yeah not sure either. I just think it can only help and shouldn't make things
worst (but could avoid extra SnapBuildSnapshotExists() calls).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

02 April 2024, 00:35:12

On Monday, April 1, 2024 7:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > On Friday, March 29, 2024 2:50 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> >
> > >
> > >
> > > 2.
> > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr
> moveto,
> > > +   bool *found_consistent_point);
> > > +
> > >
> > > This API looks a bit awkward as the functionality doesn't match the
> > > name. How about having a function with name
> > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > > ready_for_decoding) with the same functionality as your patch has
> > > for
> > > pg_logical_replication_slot_advance() and then invoke it both from
> > > pg_logical_replication_slot_advance and slotsync.c. The function
> > > name is too big, we can think of a shorter name. Any ideas?
> >
> > How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> > LogicalSlotAdvanceAndCheckDecoding()?
> >
> 
> It is about snapbuild state, so how about naming the function as
> LogicalSlotAdvanceAndCheckSnapState()?

It looks better to me, so changed.

> 
> I have made quite a few cosmetic changes in comments and code. See
> attached. This is atop your latest patch. Can you please review and include
> these changes in the next version?

Thanks, I have reviewed and merged them.
Attach the V5 patch set which addressed above comments and ran pgindent.

I will think and test the improvement suggested by Bertrand[1] and reply after that.

[1] https://www.postgresql.org/message-id/Zgp8n9QD5nYSESnM%40ip-10-97-1-34.eu-west-3.compute.internal

Best Regards,
Hou zj

Attachment

v5-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch

Re: Synchronizing slots from primary to standby

From

Bharath Rupireddy

Date:

02 April 2024, 00:42:39

On Mon, Apr 1, 2024 at 11:36 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Attach the V4 patch which includes the optimization to skip the decoding if
> the snapshot at the syncing restart_lsn is already serialized. It can avoid most
> of the duplicate decoding in my test, and I am doing some more tests locally.

Thanks for the patch. I'm thinking if we can reduce the amount of work
that we do for synced slots in each sync worker cycle. With that
context in mind, why do we need to create decoding context every time?
Can't we create it once, store it in an in-memory structure and use it
for each sync worker cycle? Is there any problem with it? What do you
think?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

02 April 2024, 03:33:07

On Mon, Apr 1, 2024 at 6:58 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote:
> > On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > > Then there is no need to call WaitForStandbyConfirmation() as it could go until
> > > the RecoveryInProgress() in StandbySlotsHaveCaughtup() for nothing (as we already
> > > know it).
> > >
> >
> > Won't it will normally return from the first check in
> > WaitForStandbyConfirmation() because standby_slot_names_config is not
> > set on standby?
>
> I think standby_slot_names can be set on a standby. One could want to set it in
> a cascading standby env (though it won't have any real effects until the standby
> is promoted).
>

Yeah, it is possible but doesn't seem worth additional checks for this
micro-optimization.

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

02 April 2024, 03:53:58

On Tuesday, April 2, 2024 8:43 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> 
> On Mon, Apr 1, 2024 at 11:36 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > Attach the V4 patch which includes the optimization to skip the
> > decoding if the snapshot at the syncing restart_lsn is already
> > serialized. It can avoid most of the duplicate decoding in my test, and I am
> doing some more tests locally.
> 
> Thanks for the patch. I'm thinking if we can reduce the amount of work that we
> do for synced slots in each sync worker cycle. With that context in mind, why do
> we need to create decoding context every time?
> Can't we create it once, store it in an in-memory structure and use it for each
> sync worker cycle? Is there any problem with it? What do you think?

Thanks for the idea. I think the cost of decoding context seems to be
relatively minor when compared to the IO cost. After generating the profiles
for the tests shared by Nisha[1], it appears that the StartupDecodingContext is
not a issue. While the suggested refactoring is an option, I think
we can consider this as a future improvement and addressing it only if we
encounter scenarios where the creation of decoding context becomes a
bottleneck.

[1] https://www.postgresql.org/message-id/CALj2ACUeij5tFzJ1-cuoUh%2Bmhj33v%2BYgqD_gHYUpRdXSCSBbhw%40mail.gmail.com

Best Regards,
Hou zj

Attachment

sync_slot_profile.zip

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

02 April 2024, 04:06:53

On Mon, Apr 1, 2024 at 5:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > 2 ===
> >
> > +       {
> > +               if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
> > +               {
> >
> > That could call SnapBuildSnapshotExists() multiple times for the same
> > "restart_lsn" (for example in case of multiple remote slots to sync).
> >
> > What if the sync worker records the last lsn it asks for serialization (and
> > serialized ? Then we could check that value first before deciding to call (or not)
> > SnapBuildSnapshotExists() on it?
> >
> > It's not ideal because it would record "only the last one" but that would be
> > simple enough for now (currently there is only one sync worker so that scenario
> > is likely to happen).
> >
>
> Yeah, we could do that but I am not sure how much it can help. I guess
> we could do some tests to see if it helps.

I had a look at test-results conducted by Nisha, I did not find any
repetitive restart_lsn from primary being synced to standby for that
particular test of 100 slots. Unless we have some concrete test in
mind (having repetitive restart_lsn), I do not think that by using the
given tests, we can establish the benefit of suggested optimization.
Attached the log files of all slots test for reference,

thanks
Shveta

Attachment

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

02 April 2024, 04:24:49

On Monday, April 1, 2024 9:28 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> 
> On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote:
> > On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot
> 
> >
> > > 2 ===
> > >
> > > +       {
> > > +               if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
> > > +               {
> > >
> > > That could call SnapBuildSnapshotExists() multiple times for the
> > > same "restart_lsn" (for example in case of multiple remote slots to sync).
> > >
> > > What if the sync worker records the last lsn it asks for
> > > serialization (and serialized ? Then we could check that value first
> > > before deciding to call (or not)
> > > SnapBuildSnapshotExists() on it?
> > >
> > > It's not ideal because it would record "only the last one" but that
> > > would be simple enough for now (currently there is only one sync
> > > worker so that scenario is likely to happen).
> > >
> >
> > Yeah, we could do that but I am not sure how much it can help. I guess
> > we could do some tests to see if it helps.
> 
> Yeah not sure either. I just think it can only help and shouldn't make things
> worst (but could avoid extra SnapBuildSnapshotExists() calls).

Thanks for the idea. I tried some tests based on Nisha's setup[1]. I tried to
advance the slots on the primary to the same restart_lsn before calling
sync_replication_slots(), and reduced the data generated by pgbench. The
SnapBuildSnapshotExists is still not noticeable in the profile. So, I feel we
could leave this as a further improvement once we encounter scenarios where
the duplicate SnapBuildSnapshotExists call becomes noticeable.

[1] https://www.postgresql.org/message-id/CALj2ACUeij5tFzJ1-cuoUh%2Bmhj33v%2BYgqD_gHYUpRdXSCSBbhw%40mail.gmail.com

Best Regards,
Hou zj

Attachment

sync_slot_profile_same_lsn.zip

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

02 April 2024, 04:56:53

Hi,

On Tue, Apr 02, 2024 at 04:24:49AM +0000, Zhijie Hou (Fujitsu) wrote:
> On Monday, April 1, 2024 9:28 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> > 
> > On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote:
> > > On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot
> > 
> > >
> > > > 2 ===
> > > >
> > > > +       {
> > > > +               if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
> > > > +               {
> > > >
> > > > That could call SnapBuildSnapshotExists() multiple times for the
> > > > same "restart_lsn" (for example in case of multiple remote slots to sync).
> > > >
> > > > What if the sync worker records the last lsn it asks for
> > > > serialization (and serialized ? Then we could check that value first
> > > > before deciding to call (or not)
> > > > SnapBuildSnapshotExists() on it?
> > > >
> > > > It's not ideal because it would record "only the last one" but that
> > > > would be simple enough for now (currently there is only one sync
> > > > worker so that scenario is likely to happen).
> > > >
> > >
> > > Yeah, we could do that but I am not sure how much it can help. I guess
> > > we could do some tests to see if it helps.
> > 
> > Yeah not sure either. I just think it can only help and shouldn't make things
> > worst (but could avoid extra SnapBuildSnapshotExists() calls).
> 
> Thanks for the idea. I tried some tests based on Nisha's setup[1].

Thank you and Nisha and Shveta for the testing!

> I tried to
> advance the slots on the primary to the same restart_lsn before calling
> sync_replication_slots(), and reduced the data generated by pgbench.

Agree that this scenario makes sense to try to see the impact of
SnapBuildSnapshotExists().

> The SnapBuildSnapshotExists is still not noticeable in the profile.

SnapBuildSnapshotExists() number of calls are probably negligeable when 
compared to the IO calls generated by the fast forward logical decoding in this
scenario.

> So, I feel we
> could leave this as a further improvement once we encounter scenarios where
> the duplicate SnapBuildSnapshotExists call becomes noticeable.

Sounds reasonable to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

02 April 2024, 07:20:46

On Tuesday, April 2, 2024 8:35 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> On Monday, April 1, 2024 7:30 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com>
> > wrote:
> > >
> > > On Friday, March 29, 2024 2:50 PM Amit Kapila
> > > <amit.kapila16@gmail.com>
> > wrote:
> > > >
> > >
> > > >
> > > >
> > > > 2.
> > > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr
> > moveto,
> > > > +   bool *found_consistent_point);
> > > > +
> > > >
> > > > This API looks a bit awkward as the functionality doesn't match
> > > > the name. How about having a function with name
> > > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > > > ready_for_decoding) with the same functionality as your patch has
> > > > for
> > > > pg_logical_replication_slot_advance() and then invoke it both from
> > > > pg_logical_replication_slot_advance and slotsync.c. The function
> > > > name is too big, we can think of a shorter name. Any ideas?
> > >
> > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> > > LogicalSlotAdvanceAndCheckDecoding()?
> > >
> >
> > It is about snapbuild state, so how about naming the function as
> > LogicalSlotAdvanceAndCheckSnapState()?
> 
> It looks better to me, so changed.
> 
> >
> > I have made quite a few cosmetic changes in comments and code. See
> > attached. This is atop your latest patch. Can you please review and
> > include these changes in the next version?
> 
> Thanks, I have reviewed and merged them.
> Attach the V5 patch set which addressed above comments and ran pgindent.

I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which can
reproduce the data loss issue consistently on my machine. It may not reproduce
in some rare cases if concurrent xl_running_xacts are written by bgwriter, but
I think it's still valuable if it can verify the fix in most cases. The test
will fail if directly applied on HEAD, and will pass after applying atop of
0001.

Best Regards,
Hou zj

Attachment

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

02 April 2024, 08:24:52

Hi,

On Tue, Apr 02, 2024 at 07:20:46AM +0000, Zhijie Hou (Fujitsu) wrote:
> I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which can
> reproduce the data loss issue consistently on my machine.

Thanks!

> It may not reproduce
> in some rare cases if concurrent xl_running_xacts are written by bgwriter, but
> I think it's still valuable if it can verify the fix in most cases.

What about adding a "wait" injection point in LogStandbySnapshot() to prevent
checkpointer/bgwriter to log a standby snapshot? Something among those lines:

       if (AmCheckpointerProcess() || AmBackgroundWriterProcess())
               INJECTION_POINT("bgw-log-standby-snapshot");

And make use of it in the test, something like:

       $node_primary->safe_psql('postgres',
               "SELECT injection_points_attach('bgw-log-standby-snapshot', 'wait');");

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

02 April 2024, 08:41:21

On Tuesday, April 2, 2024 3:21 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> On Tuesday, April 2, 2024 8:35 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Monday, April 1, 2024 7:30 PM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > >
> > > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com>
> > > wrote:
> > > >
> > > > On Friday, March 29, 2024 2:50 PM Amit Kapila
> > > > <amit.kapila16@gmail.com>
> > > wrote:
> > > > >
> > > >
> > > > >
> > > > >
> > > > > 2.
> > > > > +extern XLogRecPtr
> > > > > +pg_logical_replication_slot_advance(XLogRecPtr
> > > moveto,
> > > > > +   bool *found_consistent_point);
> > > > > +
> > > > >
> > > > > This API looks a bit awkward as the functionality doesn't match
> > > > > the name. How about having a function with name
> > > > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > > > > ready_for_decoding) with the same functionality as your patch
> > > > > has for
> > > > > pg_logical_replication_slot_advance() and then invoke it both
> > > > > from pg_logical_replication_slot_advance and slotsync.c. The
> > > > > function name is too big, we can think of a shorter name. Any ideas?
> > > >
> > > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> > > > LogicalSlotAdvanceAndCheckDecoding()?
> > > >
> > >
> > > It is about snapbuild state, so how about naming the function as
> > > LogicalSlotAdvanceAndCheckSnapState()?
> >
> > It looks better to me, so changed.
> >
> > >
> > > I have made quite a few cosmetic changes in comments and code. See
> > > attached. This is atop your latest patch. Can you please review and
> > > include these changes in the next version?
> >
> > Thanks, I have reviewed and merged them.
> > Attach the V5 patch set which addressed above comments and ran pgindent.
> 
> I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which can
> reproduce the data loss issue consistently on my machine. It may not
> reproduce in some rare cases if concurrent xl_running_xacts are written by
> bgwriter, but I think it's still valuable if it can verify the fix in most cases. The test
> will fail if directly applied on HEAD, and will pass after applying atop of 0001.

CFbot[1] complained about one query result's order in the tap-test, so I am
attaching a V7 patch set which fixed this. There are no changes in 0001.

[1] https://cirrus-ci.com/task/6375962162495488

Best Regards,
Hou zj

On Tuesday, April 2, 2024 8:49 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> 
> On Tue, Apr 2, 2024 at 2:11 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > CFbot[1] complained about one query result's order in the tap-test, so I am
> > attaching a V7 patch set which fixed this. There are no changes in 0001.
> >
> > [1] https://cirrus-ci.com/task/6375962162495488
> 
> Thanks. Here are some comments:

Thanks for the comments.

> 
> 1. Can we just remove pg_logical_replication_slot_advance and use
> LogicalSlotAdvanceAndCheckSnapState instead? If worried about the
> function naming, LogicalSlotAdvanceAndCheckSnapState can be renamed to
> pg_logical_replication_slot_advance?
> 
> + * Advance our logical replication slot forward. See
> + * LogicalSlotAdvanceAndCheckSnapState for details.
>   */
>  static XLogRecPtr
>  pg_logical_replication_slot_advance(XLogRecPtr moveto)
>  {

It was commented[1] that it's not appropriate for the
pg_logical_replication_slot_advance to have an out parameter
'ready_for_decoding' which looks bit awkward as the functionality doesn't match
the name, and is also not consistent with the style of
pg_physical_replication_slot_advance(). So, we added a new function.

> 
> 2.
> +    if (!ready_for_decoding)
> +    {
> +        elog(DEBUG1, "could not find consistent point for synced
> slot; restart_lsn = %X/%X",
> +             LSN_FORMAT_ARGS(slot->data.restart_lsn));
> 
> Can we specify the slot name in the message?

Added.

> 
> 3. Also, can the "could not find consistent point for synced slot;
> restart_lsn = %X/%X" be emitted at LOG level just like other messages
> in update_and_persist_local_synced_slot. Although, I see "XXX should
> this be changed to elog(DEBUG1) perhaps?", these messages need to be
> at LOG level as they help debug issues if at all they are hit.

Changed to LOG and reworded the message.

> 
> 4. How about using found_consistent_snapshot instead of
> ready_for_decoding? A general understanding is that the synced slots
> are not allowed for decoding (although with this fix, we do that for
> internal purposes), ready_for_decoding looks a bit misleading.

Agreed and renamed.

> 
> 5. As far as the test case for this issue is concerned, I'm fine with
> adding one using an INJECTION point because we seem to be having no
> consistent way to control postgres writing current snapshot to WAL.

Since me and my colleagues can reproduce the issue consistently after applying
0002 and it could be rare for concurrent xl_running_xacts to happen, we discussed[2] to
consider adding the INJECTION point after pushing the main fix.

> 
> 6. A nit: can we use "fast_forward mode" instead of "fast-forward
> mode" just to be consistent?
> +     * logical changes unless we are in fast-forward mode where no changes
> are
> 
> 7.
> +    /*
> +     * We need to access the system tables during decoding to build the
> +     * logical changes unless we are in fast-forward mode where no changes
> are
> +     * generated.
> +     */
> +    if (slot->data.database != MyDatabaseId && !fast_forward)
> 
> May I know if we need this change for this fix?

The slotsync worker needs to advance the slots from different databases in
fast_forward. So, we need to skip this check in fast_forward mode. The analysis can
be found in [3].

Attach the V8 patch which addressed above comments.


[1] https://www.postgresql.org/message-id/CAA4eK1%2BwkaRi2BrLLC_0gKbHN68Awc9dRp811G3An6A6fuqdOg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/ZgvI9iAUWCZ17z5V%40ip-10-97-1-34.eu-west-3.compute.internal
[3] https://www.postgresql.org/message-id/CAJpy0uCQ2PDCAqcnbdOz6q_ZqmBfMyBpVqKDqL_XZBP%3DeK-1yw%40mail.gmail.com

Best Regards,
Hou zj

On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Apr 3, 2024 at 9:04 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > I'd just rename LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr
> > > moveto, bool *found_consistent_snapshot) to
> > > pg_logical_replication_slot_advance(XLogRecPtr moveto, bool
> > > *found_consistent_snapshot) and use it. If others don't like this, I'd
> > > at least turn pg_logical_replication_slot_advance(XLogRecPtr moveto) a
> > > static inline function.
> > >
> > Yeah, we can do that but it is not a performance-sensitive routine so
> > don't know if it is worth it.
>
> Okay for what the patch has right now. No more bikeshedding from me on this.
>
> > > > The slotsync worker needs to advance the slots from different databases in
> > > > fast_forward. So, we need to skip this check in fast_forward mode. The analysis can
> > > > be found in [3].
> > > -    if (slot->data.database != MyDatabaseId)
> > > +    /*
> > > +     * We need to access the system tables during decoding to build the
> > > +     * logical changes unless we are in fast_forward mode where no changes are
> > > +     * generated.
> > > +     */
> > > +    if (slot->data.database != MyDatabaseId && !fast_forward)
> > >          ereport(ERROR,
> > >
> > > It's not clear from the comment that we need it for a slotsync worker
> > > to advance the slots from different databases. Can this be put into
> > > the comment? Also, specify in the comment, why this is safe?
> > >
> > It is not specific to slot sync worker but specific to fast_forward
> > mode. There is already a comment "We need to access the system tables
> > during decoding to build the logical changes unless we are in
> > fast_forward mode where no changes are generated." telling why it is
> > safe. The point is we need database access to access system tables
> > while generating the logical changes and in fast-forward mode, we
> > don't generate logical changes so this check is not required. Do let
> > me if you have a different understanding or if my understanding is
> > incorrect.
>
> Understood. Thanks. Just curious, why isn't a problem for the existing
> fast_forward mode callers pg_replication_slot_advance and
> LogicalReplicationSlotHasPendingWal?
>

We call those after connecting to the database and the slot also
belongs to that database whereas during synchronization of slots
standby. the slots could be from different databases.

> I quickly looked at v8, and have a nit, rest all looks good.
>
> +        if (DecodingContextReady(ctx) && found_consistent_snapshot)
> +            *found_consistent_snapshot = true;
>
> Can the found_consistent_snapshot be checked first to help avoid the
> function call DecodingContextReady() for pg_replication_slot_advance
> callers?
>

Okay, changed. Additionally, I have updated the comments and commit
message. I'll push this patch after some more testing.

--
With Regards,
Amit Kapila.

Attachment

v9-0001-Ensure-that-the-sync-slots-reach-a-consistent-sta.patch

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

03 April 2024, 10:05:50

On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > I quickly looked at v8, and have a nit, rest all looks good.
> >
> > +        if (DecodingContextReady(ctx) && found_consistent_snapshot)
> > +            *found_consistent_snapshot = true;
> >
> > Can the found_consistent_snapshot be checked first to help avoid the
> > function call DecodingContextReady() for pg_replication_slot_advance
> > callers?
> >
>
> Okay, changed. Additionally, I have updated the comments and commit
> message. I'll push this patch after some more testing.
>

Pushed!

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Masahiko Sawada

Date:

04 April 2024, 08:24:58

On Wed, Apr 3, 2024 at 7:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > > I quickly looked at v8, and have a nit, rest all looks good.
> > >
> > > +        if (DecodingContextReady(ctx) && found_consistent_snapshot)
> > > +            *found_consistent_snapshot = true;
> > >
> > > Can the found_consistent_snapshot be checked first to help avoid the
> > > function call DecodingContextReady() for pg_replication_slot_advance
> > > callers?
> > >
> >
> > Okay, changed. Additionally, I have updated the comments and commit
> > message. I'll push this patch after some more testing.
> >
>
> Pushed!

While testing this change, I realized that it could happen that the
server logs are flooded with the following logical decoding logs that
are written every 200 ms:

2024-04-04 16:15:19.270 JST [3838739] LOG:  starting logical decoding
for slot "test_sub"
2024-04-04 16:15:19.270 JST [3838739] DETAIL:  Streaming transactions
committing after 0/50006F48, reading WAL from 0/50006F10.
2024-04-04 16:15:19.270 JST [3838739] LOG:  logical decoding found
consistent point at 0/50006F10
2024-04-04 16:15:19.270 JST [3838739] DETAIL:  There are no running
transactions.
2024-04-04 16:15:19.477 JST [3838739] LOG:  starting logical decoding
for slot "test_sub"
2024-04-04 16:15:19.477 JST [3838739] DETAIL:  Streaming transactions
committing after 0/50006F48, reading WAL from 0/50006F10.
2024-04-04 16:15:19.477 JST [3838739] LOG:  logical decoding found
consistent point at 0/50006F10
2024-04-04 16:15:19.477 JST [3838739] DETAIL:  There are no running
transactions.

For example, I could reproduce it with the following steps:

1. create the primary and start.
2. run "pgbench -i -s 100" on the primary.
3. run pg_basebackup to create the standby.
4. configure slotsync setup on the standby and start.
5. create a publication for all tables on the primary.
6. create the subscriber and start.
7. run "pgbench -i -Idtpf" on the subscriber.
8. create a subscription on the subscriber (initial data copy will start).

The logical decoding logs were written every 200 ms during the initial
data synchronization.

Looking at the new changes for update_local_synced_slot():

    if (remote_slot->confirmed_lsn != slot->data.confirmed_flush ||
        remote_slot->restart_lsn != slot->data.restart_lsn ||
        remote_slot->catalog_xmin != slot->data.catalog_xmin)
    {
        /*
         * We can't directly copy the remote slot's LSN or xmin unless there
         * exists a consistent snapshot at that point. Otherwise, after
         * promotion, the slots may not reach a consistent point before the
         * confirmed_flush_lsn which can lead to a data loss. To avoid data
         * loss, we let slot machinery advance the slot which ensures that
         * snapbuilder/slot statuses are updated properly.
         */
        if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
        {
            /*
             * Update the slot info directly if there is a serialized snapshot
             * at the restart_lsn, as the slot can quickly reach consistency
             * at restart_lsn by restoring the snapshot.
             */
            SpinLockAcquire(&slot->mutex);
            slot->data.restart_lsn = remote_slot->restart_lsn;
            slot->data.confirmed_flush = remote_slot->confirmed_lsn;
            slot->data.catalog_xmin = remote_slot->catalog_xmin;
            slot->effective_catalog_xmin = remote_slot->catalog_xmin;
            SpinLockRelease(&slot->mutex);

            if (found_consistent_snapshot)
                *found_consistent_snapshot = true;
        }
        else
        {
            LogicalSlotAdvanceAndCheckSnapState(remote_slot->confirmed_lsn,
                                                found_consistent_snapshot);
        }

        ReplicationSlotsComputeRequiredXmin(false);
        ReplicationSlotsComputeRequiredLSN();

        slot_updated = true;

We call LogicalSlotAdvanceAndCheckSnapState() if one of confirmed_lsn,
restart_lsn, and catalog_xmin is different between the remote slot and
the local slot. In my test case, during the initial sync performing,
only catalog_xmin was different and there was no serialized snapshot
at restart_lsn, and the slotsync worker called
LogicalSlotAdvanceAndCheckSnapState(). However no slot properties were
changed even after the function and it set slot_updated = true. So it
starts the next slot synchronization after 200ms.

It seems to me that we can skip calling
LogicalSlotAdvanceAndCheckSnapState() at least when the remote and
local have the same restart_lsn and confirmed_lsn.

I'm not sure there are other scenarios but is it worth fixing this symptom?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

04 April 2024, 09:29:09

On Wed, Apr 3, 2024 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > > I quickly looked at v8, and have a nit, rest all looks good.
> > >
> > > +        if (DecodingContextReady(ctx) && found_consistent_snapshot)
> > > +            *found_consistent_snapshot = true;
> > >
> > > Can the found_consistent_snapshot be checked first to help avoid the
> > > function call DecodingContextReady() for pg_replication_slot_advance
> > > callers?
> > >
> >
> > Okay, changed. Additionally, I have updated the comments and commit
> > message. I'll push this patch after some more testing.
> >
>
> Pushed!

There is an intermittent BF failure observed at [1] after this commit (2ec005b).

Analysis:
We see in BF logs, that during the first call of the sync function,
restart_lsn of the synced slot is advanced to a lsn which is > than
remote slot's restart_lsn. And when sync call is done second time
without any further change on primary, it hits the error:
  ERROR:  cannot synchronize local slot "lsub1_slot" LSN(0/3000060) to
remote slot's LSN(0/3000028) as synchronization would move it
backwards

Relevant BF logs are given at [2]. This reproduces intermittently
depending upon if bgwriter logs running txn record when the test is
running. We were able to mimic the test case to reproduce the failure.
Please see attached bf-test.txt for steps.

Issue:
Issue is that we are having a wrong sanity check based on
'restart_lsn' in synchronize_one_slot():

if (remote_slot->restart_lsn < slot->data.restart_lsn)
    elog(ERROR, ...);

Prior to commit 2ec005b, this check was okay, as we did not expect
restart_lsn of the synced slot to be ahead of remote since we were
directly copying the lsns. But now when we use 'advance' to do logical
decoding on standby, there is a possibility that restart lsn of the
synced slot is ahead of remote slot, if there are running txns records
found after reaching consistent-point while consuming WALs from
restart_lsn till confirmed_lsn. In such a case, slot-sync's advance
may end up serializing snapshots and setting restart_lsn to the
serialized snapshot point, ahead of remote one.

Fix:
The sanity check needs to be corrected. Attached a patch to address the issue.
a) The sanity check is corrected to compare confirmed_lsn rather than
restart_lsn.
Additional changes:
b) A log has been added after LogicalSlotAdvanceAndCheckSnapState() to
log the case when the local and remote slots' confirmed-lsn were not
found to be the same after sync (if at all).
c) Now we attempt to sync in update_local_synced_slot() if one of
confirmed_lsn, restart_lsn, and catalog_xmin for remote slot is ahead
of local slot instead of them just being unequal.

[1]:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-04-03%2017%3A57%3A28

[2]:
2024-04-03 18:00:41.896 UTC [3239617][client backend][0/2:0] LOG:
statement: SELECT pg_sync_replication_slots();
LOG:  starting logical decoding for slot "lsub1_slot"
DETAIL:  Streaming transactions committing after 0/0, reading WAL from
0/3000028.
LOG:  logical decoding found consistent point at 0/3000028
DEBUG:  serializing snapshot to pg_logical/snapshots/0-3000060.snap
DEBUG:  got new restart lsn 0/3000060 at 0/3000060
LOG:  newly created slot "lsub1_slot" is sync-ready now

2024-04-03 18:00:45.218 UTC [3243487][client backend][2/2:0] LOG:
statement: SELECT pg_sync_replication_slots();
  ERROR:  cannot synchronize local slot "lsub1_slot" LSN(0/3000060) to
remote slot's LSN(0/3000028) as synchronization would move it
backwards

thanks
Shveta

Attachment

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

04 April 2024, 09:36:54

On Thu, Apr 4, 2024 at 1:55 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> While testing this change, I realized that it could happen that the
> server logs are flooded with the following logical decoding logs that
> are written every 200 ms:
>
> 2024-04-04 16:15:19.270 JST [3838739] LOG:  starting logical decoding
> for slot "test_sub"
...
...
>
> For example, I could reproduce it with the following steps:
>
> 1. create the primary and start.
> 2. run "pgbench -i -s 100" on the primary.
> 3. run pg_basebackup to create the standby.
> 4. configure slotsync setup on the standby and start.
> 5. create a publication for all tables on the primary.
> 6. create the subscriber and start.
> 7. run "pgbench -i -Idtpf" on the subscriber.
> 8. create a subscription on the subscriber (initial data copy will start).
>
> The logical decoding logs were written every 200 ms during the initial
> data synchronization.
>
> Looking at the new changes for update_local_synced_slot():
>
>     if (remote_slot->confirmed_lsn != slot->data.confirmed_flush ||
>         remote_slot->restart_lsn != slot->data.restart_lsn ||
>         remote_slot->catalog_xmin != slot->data.catalog_xmin)
>     {
>         /*
>          * We can't directly copy the remote slot's LSN or xmin unless there
>          * exists a consistent snapshot at that point. Otherwise, after
>          * promotion, the slots may not reach a consistent point before the
>          * confirmed_flush_lsn which can lead to a data loss. To avoid data
>          * loss, we let slot machinery advance the slot which ensures that
>          * snapbuilder/slot statuses are updated properly.
>          */
>         if (SnapBuildSnapshotExists(remote_slot->restart_lsn))
>         {
>             /*
>              * Update the slot info directly if there is a serialized snapshot
>              * at the restart_lsn, as the slot can quickly reach consistency
>              * at restart_lsn by restoring the snapshot.
>              */
>             SpinLockAcquire(&slot->mutex);
>             slot->data.restart_lsn = remote_slot->restart_lsn;
>             slot->data.confirmed_flush = remote_slot->confirmed_lsn;
>             slot->data.catalog_xmin = remote_slot->catalog_xmin;
>             slot->effective_catalog_xmin = remote_slot->catalog_xmin;
>             SpinLockRelease(&slot->mutex);
>
>             if (found_consistent_snapshot)
>                 *found_consistent_snapshot = true;
>         }
>         else
>         {
>             LogicalSlotAdvanceAndCheckSnapState(remote_slot->confirmed_lsn,
>                                                 found_consistent_snapshot);
>         }
>
>         ReplicationSlotsComputeRequiredXmin(false);
>         ReplicationSlotsComputeRequiredLSN();
>
>         slot_updated = true;
>
> We call LogicalSlotAdvanceAndCheckSnapState() if one of confirmed_lsn,
> restart_lsn, and catalog_xmin is different between the remote slot and
> the local slot. In my test case, during the initial sync performing,
> only catalog_xmin was different and there was no serialized snapshot
> at restart_lsn, and the slotsync worker called
> LogicalSlotAdvanceAndCheckSnapState(). However no slot properties were
> changed even after the function and it set slot_updated = true. So it
> starts the next slot synchronization after 200ms.
>
> It seems to me that we can skip calling
> LogicalSlotAdvanceAndCheckSnapState() at least when the remote and
> local have the same restart_lsn and confirmed_lsn.
>

I think we can do that but do we know what caused catalog_xmin to be
updated regularly without any change in restart/confirmed_flush LSN? I
think the LSNs are not updated during the initial sync (copy) time but
how catalog_xmin is getting updated for the same slot?

BTW, if we see, we will probably anyway except this xmin as it is due
to the following code in LogicalIncreaseXminForSlot()

LogicalIncreaseXminForSlot()
{
/*
* If the client has already confirmed up to this lsn, we directly can
* mark this as accepted. This can happen if we restart decoding in a
* slot.
*/
else if (current_lsn <= slot->data.confirmed_flush)
{
slot->candidate_catalog_xmin = xmin;
slot->candidate_xmin_lsn = current_lsn;

/* our candidate can directly be used */
updated_xmin = true;
}

> I'm not sure there are other scenarios but is it worth fixing this symptom?
>

I think so but let's investigate this a bit more. BTW, while thinking
on this one, I noticed that in the function
LogicalConfirmReceivedLocation(), we first update the disk copy, see
comment [1] and then in-memory whereas the same is not true in
update_local_synced_slot() for the case when snapshot exists. Now, do
we have the same risk here in case of standby? Because I think we will
use these xmins while sending the feedback message (in
XLogWalRcvSendHSFeedback()).

[1]
/*
* We have to write the changed xmin to disk *before* we change
* the in-memory value, otherwise after a crash we wouldn't know
* that some catalog tuples might have been removed already.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

04 April 2024, 12:01:45

On Thu, Apr 4, 2024 at 2:59 PM shveta malik <shveta.malik@gmail.com> wrote:
>
>
> Prior to commit 2ec005b, this check was okay, as we did not expect
> restart_lsn of the synced slot to be ahead of remote since we were
> directly copying the lsns. But now when we use 'advance' to do logical
> decoding on standby, there is a possibility that restart lsn of the
> synced slot is ahead of remote slot, if there are running txns records
> found after reaching consistent-point while consuming WALs from
> restart_lsn till confirmed_lsn. In such a case, slot-sync's advance
> may end up serializing snapshots and setting restart_lsn to the
> serialized snapshot point, ahead of remote one.
>
> Fix:
> The sanity check needs to be corrected. Attached a patch to address the issue.

Please find v2 which has detailed commit-msg and some more comments in code.

thanks
Shveta

Attachment

v2-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

05 April 2024, 03:52:19

Hi,

On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote:
> On Thu, Apr 4, 2024 at 2:59 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> >
> > Prior to commit 2ec005b, this check was okay, as we did not expect
> > restart_lsn of the synced slot to be ahead of remote since we were
> > directly copying the lsns. But now when we use 'advance' to do logical
> > decoding on standby, there is a possibility that restart lsn of the
> > synced slot is ahead of remote slot, if there are running txns records
> > found after reaching consistent-point while consuming WALs from
> > restart_lsn till confirmed_lsn. In such a case, slot-sync's advance
> > may end up serializing snapshots and setting restart_lsn to the
> > serialized snapshot point, ahead of remote one.
> >
> > Fix:
> > The sanity check needs to be corrected. Attached a patch to address the issue.
> 

Thanks for reporting, explaining the issue and providing a patch.

Regarding the patch:

1 ===

+        * Attempt to sync lsns and xmins only if remote slot is ahead of local

s/lsns/LSNs/?

2 ===

+                       if (slot->data.confirmed_flush != remote_slot->confirmed_lsn)
+                               elog(LOG,
+                                        "could not synchronize local slot \"%s\" LSN(%X/%X)"
+                                        " to remote slot's LSN(%X/%X) ",
+                                        remote_slot->name,
+                                        LSN_FORMAT_ARGS(slot->data.confirmed_flush),
+                                        LSN_FORMAT_ARGS(remote_slot->confirmed_lsn));

I don't think that the message is correct here. Unless I am missing something
there is nothing in the following code path that would prevent the slot to be
sync during this cycle. 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

05 April 2024, 04:13:35

On Fri, Apr 5, 2024 at 9:22 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote:
> > On Thu, Apr 4, 2024 at 2:59 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > >
> > > Prior to commit 2ec005b, this check was okay, as we did not expect
> > > restart_lsn of the synced slot to be ahead of remote since we were
> > > directly copying the lsns. But now when we use 'advance' to do logical
> > > decoding on standby, there is a possibility that restart lsn of the
> > > synced slot is ahead of remote slot, if there are running txns records
> > > found after reaching consistent-point while consuming WALs from
> > > restart_lsn till confirmed_lsn. In such a case, slot-sync's advance
> > > may end up serializing snapshots and setting restart_lsn to the
> > > serialized snapshot point, ahead of remote one.
> > >
> > > Fix:
> > > The sanity check needs to be corrected. Attached a patch to address the issue.
> >
>
> Thanks for reporting, explaining the issue and providing a patch.
>
> Regarding the patch:
>
> 1 ===
>
> +        * Attempt to sync lsns and xmins only if remote slot is ahead of local
>
> s/lsns/LSNs/?
>
> 2 ===
>
> +                       if (slot->data.confirmed_flush != remote_slot->confirmed_lsn)
> +                               elog(LOG,
> +                                        "could not synchronize local slot \"%s\" LSN(%X/%X)"
> +                                        " to remote slot's LSN(%X/%X) ",
> +                                        remote_slot->name,
> +                                        LSN_FORMAT_ARGS(slot->data.confirmed_flush),
> +                                        LSN_FORMAT_ARGS(remote_slot->confirmed_lsn));
>
> I don't think that the message is correct here. Unless I am missing something
> there is nothing in the following code path that would prevent the slot to be
> sync during this cycle.

This is a sanity check,  I will put a comment to indicate the same. We
want to ensure if anything changes in future, we get correct logs to
indicate that.
If required, the LOG msg can be changed. Kindly suggest if you have
anything better in mind.

thanks
Shveta

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

05 April 2024, 04:39:16

Hi,

On Fri, Apr 05, 2024 at 09:43:35AM +0530, shveta malik wrote:
> On Fri, Apr 5, 2024 at 9:22 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote:
> > > On Thu, Apr 4, 2024 at 2:59 PM shveta malik <shveta.malik@gmail.com> wrote:
> > 2 ===
> >
> > +                       if (slot->data.confirmed_flush != remote_slot->confirmed_lsn)
> > +                               elog(LOG,
> > +                                        "could not synchronize local slot \"%s\" LSN(%X/%X)"
> > +                                        " to remote slot's LSN(%X/%X) ",
> > +                                        remote_slot->name,
> > +                                        LSN_FORMAT_ARGS(slot->data.confirmed_flush),
> > +                                        LSN_FORMAT_ARGS(remote_slot->confirmed_lsn));
> >
> > I don't think that the message is correct here. Unless I am missing something
> > there is nothing in the following code path that would prevent the slot to be
> > sync during this cycle.
> 
> This is a sanity check,  I will put a comment to indicate the same.

Thanks!

> We
> want to ensure if anything changes in future, we get correct logs to
> indicate that.

Right, understood that way.

> If required, the LOG msg can be changed. Kindly suggest if you have
> anything better in mind.
> 

What about something like?

ereport(LOG,
        errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs from remote slot",
        remote_slot->name),
        errdetail("Remote slot has LSN %X/%X but local slot has LSN %X/%X.",
        LSN_FORMAT_ARGS(remote_slot->restart_lsn),
        LSN_FORMAT_ARGS(slot->data.restart_lsn));

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

05 April 2024, 10:39:01

On Fri, Apr 5, 2024 at 10:09 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> What about something like?
>
> ereport(LOG,
>         errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs from remote slot",
>         remote_slot->name),
>         errdetail("Remote slot has LSN %X/%X but local slot has LSN %X/%X.",
>         LSN_FORMAT_ARGS(remote_slot->restart_lsn),
>         LSN_FORMAT_ARGS(slot->data.restart_lsn));
>
> Regards,

+1. Better than earlier. I will update and post the patch.

thanks
Shveta

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

05 April 2024, 11:01:50

Hi,

On Fri, Apr 05, 2024 at 04:09:01PM +0530, shveta malik wrote:
> On Fri, Apr 5, 2024 at 10:09 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > What about something like?
> >
> > ereport(LOG,
> >         errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs from remote slot",
> >         remote_slot->name),
> >         errdetail("Remote slot has LSN %X/%X but local slot has LSN %X/%X.",
> >         LSN_FORMAT_ARGS(remote_slot->restart_lsn),
> >         LSN_FORMAT_ARGS(slot->data.restart_lsn));
> >
> > Regards,
> 
> +1. Better than earlier. I will update and post the patch.
> 

Thanks!

BTW, I just realized that the LSN I used in my example in the LSN_FORMAT_ARGS()
are not the right ones.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

05 April 2024, 11:33:11

On Fri, Apr 5, 2024 at 4:31 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> BTW, I just realized that the LSN I used in my example in the LSN_FORMAT_ARGS()
> are not the right ones.

Noted. Thanks.

Please find v3 with the comments addressed.

thanks
Shveta

Attachment

v3-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

05 April 2024, 11:47:17

On Thu, Apr 4, 2024 at 2:59 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> There is an intermittent BF failure observed at [1] after this commit (2ec005b).
>

Thanks for analyzing and providing the patch. I'll look into it. There
is another BF failure [1] which I have analyzed. The main reason for
failure is the following test:

#   Failed test 'logical slots have synced as true on standby'
#   at /home/bf/bf-build/serinus/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl
line 198.
#          got: 'f'
#     expected: 't'

Here, we are expecting that the two logical slots (lsub1_slot, and
lsub2_slot), one created via subscription and another one via API
pg_create_logical_replication_slot() are synced. The standby LOGs
which are as follows show that the one created by API 'lsub2_slot' is
synced but the other one 'lsub1_slot':

LOG for lsub1_slot:
================
2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] DETAIL:
Streaming transactions committing after 0/0, reading WAL from
0/3000060.
2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0]
STATEMENT:  SELECT pg_sync_replication_slots();
2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] DEBUG:
xmin required by slots: data 0, catalog 740
2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] LOG:
could not sync slot "lsub1_slot"

LOG for lsub2_slot:
================
2024-04-05 04:37:08.518 UTC [3867682][client backend][0/2:0] DEBUG:
xmin required by slots: data 0, catalog 740
2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] LOG:
newly created slot "lsub2_slot" is sync-ready now
2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0]
STATEMENT:  SELECT pg_sync_replication_slots();

We can see from the log of lsub1_slot that its restart_lsn is
0/3000060 which means it will start reading from the WAL from that
location. Now, if we check the publisher log, we have running_xacts
record at that location. See following LOGs:

2024-04-05 04:36:57.830 UTC [3860839][client backend][8/2:0] LOG:
statement: SELECT pg_create_logical_replication_slot('lsub2_slot',
'test_decoding', false, false, true);
2024-04-05 04:36:58.718 UTC [3860839][client backend][8/2:0] DEBUG:
snapshot of 0+0 running transaction ids (lsn 0/3000060 oldest xid 740
latest complete 739 next xid 740)
....
....
2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:
snapshot of 0+0 running transaction ids (lsn 0/3000098 oldest xid 740
latest complete 739 next xid 740)

The first running_xact record ends at 3000060 and the second one at
3000098. So, the start location of the second running_xact is 3000060,
the same can be confirmed by the following LOG line of walsender:

2024-04-05 04:37:05.144 UTC [3857385][walsender][25/0:0] DEBUG:
serializing snapshot to pg_logical/snapshots/0-3000060.snap

This shows that while processing running_xact at location 3000060, we
have serialized the snapshot. As there is no running transaction in
WAL at 3000060 so ideally we should have reached a consistent state
after processing that record on standby. But the reason standby didn't
process that LOG is that the confirmed_flush LSN is also at the same
location so the function LogicalSlotAdvanceAndCheckSnapState() exits
without reading the WAL at that location. Now, this can be confirmed
by the below walsender-specific LOG in publisher:

2024-04-05 04:36:59.155 UTC [3857385][walsender][25/0:0] DEBUG: write
0/3000060 flush 0/3000060 apply 0/3000060 reply_time 2024-04-05
04:36:59.155181+00

We update the confirmed_flush location with the flush location after
receiving the above feedback. You can notice that we didn't receive
the feedback for the 3000098 location and hence both the
confirmed_flush and restart_lsn are at the same location 0/3000060.
Now, the test is waiting for the subscriber to send feedback of the
last WAL write location by
$primary->wait_for_catchup('regress_mysub1'); As noticed from the
publisher LOGs, the query we used for wait is:

SELECT '0/3000060' <= replay_lsn AND state = 'streaming'
        FROM pg_catalog.pg_stat_replication
        WHERE application_name IN ('regress_mysub1', 'walreceiver')

Here, instead of '0/3000060' it should have used ''0/3000098' which is
the last write location. This position we get via function
pg_current_wal_lsn()->GetXLogWriteRecPtr()->LogwrtResult.Write. And
this variable seems to be touched by commit
c9920a9068eac2e6c8fb34988d18c0b42b9bf811. Though unlikely could
c9920a9068eac2e6c8fb34988d18c0b42b9bf811 be a reason for failure? At
this stage, I am not sure so just sharing with others to see if what I
am saying sounds logical. I'll think more about this.

[1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-04-05%2004%3A34%3A27

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

05 April 2024, 12:53:10

On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Apr 4, 2024 at 2:59 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > There is an intermittent BF failure observed at [1] after this commit (2ec005b).
> >
>
> Thanks for analyzing and providing the patch. I'll look into it. There
> is another BF failure [1] which I have analyzed. The main reason for
> failure is the following test:
>
> #   Failed test 'logical slots have synced as true on standby'
> #   at /home/bf/bf-build/serinus/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl
> line 198.
> #          got: 'f'
> #     expected: 't'
>
> Here, we are expecting that the two logical slots (lsub1_slot, and
> lsub2_slot), one created via subscription and another one via API
> pg_create_logical_replication_slot() are synced. The standby LOGs
> which are as follows show that the one created by API 'lsub2_slot' is
> synced but the other one 'lsub1_slot':
>
> LOG for lsub1_slot:
> ================
> 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] DETAIL:
> Streaming transactions committing after 0/0, reading WAL from
> 0/3000060.
> 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0]
> STATEMENT:  SELECT pg_sync_replication_slots();
> 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] DEBUG:
> xmin required by slots: data 0, catalog 740
> 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] LOG:
> could not sync slot "lsub1_slot"
>
> LOG for lsub2_slot:
> ================
> 2024-04-05 04:37:08.518 UTC [3867682][client backend][0/2:0] DEBUG:
> xmin required by slots: data 0, catalog 740
> 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] LOG:
> newly created slot "lsub2_slot" is sync-ready now
> 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0]
> STATEMENT:  SELECT pg_sync_replication_slots();
>
> We can see from the log of lsub1_slot that its restart_lsn is
> 0/3000060 which means it will start reading from the WAL from that
> location. Now, if we check the publisher log, we have running_xacts
> record at that location. See following LOGs:
>
> 2024-04-05 04:36:57.830 UTC [3860839][client backend][8/2:0] LOG:
> statement: SELECT pg_create_logical_replication_slot('lsub2_slot',
> 'test_decoding', false, false, true);
> 2024-04-05 04:36:58.718 UTC [3860839][client backend][8/2:0] DEBUG:
> snapshot of 0+0 running transaction ids (lsn 0/3000060 oldest xid 740
> latest complete 739 next xid 740)
> ....
> ....
> 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:
> snapshot of 0+0 running transaction ids (lsn 0/3000098 oldest xid 740
> latest complete 739 next xid 740)
>
> The first running_xact record ends at 3000060 and the second one at
> 3000098. So, the start location of the second running_xact is 3000060,
> the same can be confirmed by the following LOG line of walsender:
>
> 2024-04-05 04:37:05.144 UTC [3857385][walsender][25/0:0] DEBUG:
> serializing snapshot to pg_logical/snapshots/0-3000060.snap
>
> This shows that while processing running_xact at location 3000060, we
> have serialized the snapshot. As there is no running transaction in
> WAL at 3000060 so ideally we should have reached a consistent state
> after processing that record on standby. But the reason standby didn't
> process that LOG is that the confirmed_flush LSN is also at the same
> location so the function LogicalSlotAdvanceAndCheckSnapState() exits
> without reading the WAL at that location. Now, this can be confirmed
> by the below walsender-specific LOG in publisher:
>
> 2024-04-05 04:36:59.155 UTC [3857385][walsender][25/0:0] DEBUG: write
> 0/3000060 flush 0/3000060 apply 0/3000060 reply_time 2024-04-05
> 04:36:59.155181+00
>
> We update the confirmed_flush location with the flush location after
> receiving the above feedback. You can notice that we didn't receive
> the feedback for the 3000098 location and hence both the
> confirmed_flush and restart_lsn are at the same location 0/3000060.
> Now, the test is waiting for the subscriber to send feedback of the
> last WAL write location by
> $primary->wait_for_catchup('regress_mysub1'); As noticed from the
> publisher LOGs, the query we used for wait is:
>
> SELECT '0/3000060' <= replay_lsn AND state = 'streaming'
>         FROM pg_catalog.pg_stat_replication
>         WHERE application_name IN ('regress_mysub1', 'walreceiver')
>
> Here, instead of '0/3000060' it should have used ''0/3000098' which is
> the last write location. This position we get via function
> pg_current_wal_lsn()->GetXLogWriteRecPtr()->LogwrtResult.Write. And
> this variable seems to be touched by commit
> c9920a9068eac2e6c8fb34988d18c0b42b9bf811. Though unlikely could
> c9920a9068eac2e6c8fb34988d18c0b42b9bf811 be a reason for failure? At
> this stage, I am not sure so just sharing with others to see if what I
> am saying sounds logical. I'll think more about this.
>

Thinking more on this, it doesn't seem related to
c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't change
any locking or something like that which impacts write positions. I
think what has happened here is that running_xact record written by
the background writer [1] is not written to the kernel or disk (see
LogStandbySnapshot()), before pg_current_wal_lsn() checks the
current_lsn to be compared with replayed LSN. Note that the reason why
walsender has picked the running_xact written by background writer is
because it has checked after pg_current_wal_lsn() query, see LOGs [2].
I think we can probably try to reproduce manually via debugger.

If this theory is correct then I think we will need to use injection
points to control the behavior of bgwriter or use the slots created
via SQL API for syncing in tests.

Thoughts?

[1] - 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0]
DEBUG: snapshot of 0+0 running transaction ids (lsn 0/3000098 oldest
xid 740 latest complete 739 next xid 740)

[2] -
2024-04-05 04:37:05.134 UTC [3866413][client backend][1/4:0] LOG:
statement: SELECT pg_current_wal_lsn()
2024-04-05 04:37:05.144 UTC [3866413][client backend][:0] LOG:
disconnection: session time: 0:00:00.021 user=bf database=postgres
host=[local]
2024-04-05 04:37:05.144 UTC [3857385][walsender][25/0:0] DEBUG:
serializing snapshot to pg_logical/snapshots/0-3000060.snap

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

05 April 2024, 14:35:42

Hi,

On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote:
> On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Thinking more on this, it doesn't seem related to
> c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't change
> any locking or something like that which impacts write positions.

Agree.

> I think what has happened here is that running_xact record written by
> the background writer [1] is not written to the kernel or disk (see
> LogStandbySnapshot()), before pg_current_wal_lsn() checks the
> current_lsn to be compared with replayed LSN.

Agree, I think it's not visible through pg_current_wal_lsn() yet.

Also I think that the DEBUG message in LogCurrentRunningXacts() 

"
        elog(DEBUG2,
             "snapshot of %d+%d running transaction ids (lsn %X/%X oldest xid %u latest complete %u next xid %u)",
             CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt,
             LSN_FORMAT_ARGS(recptr),
             CurrRunningXacts->oldestRunningXid,
             CurrRunningXacts->latestCompletedXid,
             CurrRunningXacts->nextXid);
"

should be located after the XLogSetAsyncXactLSN() call. Indeed, the new LSN is
visible after the spinlock (XLogCtl->info_lck) in XLogSetAsyncXactLSN() is
released, see:

\watch on Session 1 provides: 

 pg_current_wal_lsn
--------------------
 0/87D110
(1 row)

Until:

Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at xlog.c:2579 
2579            XLogRecPtr      WriteRqstPtr = asyncXactLSN;
(gdb) n
2581            bool            wakeup = false;
(gdb)
2584            SpinLockAcquire(&XLogCtl->info_lck);
(gdb)
2585            RefreshXLogWriteResult(LogwrtResult);
(gdb)
2586            sleeping = XLogCtl->WalWriterSleeping;
(gdb)
2587            prevAsyncXactLSN = XLogCtl->asyncXactLSN;
(gdb)
2588            if (XLogCtl->asyncXactLSN < asyncXactLSN)
(gdb)
2589                    XLogCtl->asyncXactLSN = asyncXactLSN;
(gdb)
2590            SpinLockRelease(&XLogCtl->info_lck);
(gdb) p p/x (uint32) XLogCtl->asyncXactLSN
$1 = 0x87d148

Then session 1 provides:

 pg_current_wal_lsn
--------------------
 0/87D148
(1 row)

So, when we see in the log:

2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:  snapshot of 0+0 running transaction ids (lsn
0/3000098oldest xid 740 latest complete 739 next xid 740)
 
2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG:  statement: SELECT '0/3000060' <= replay_lsn AND
state= 'streaming'
 

It's indeed possible that the new LSN was not visible yet (spinlock not released?)
before the query began (because we can not rely on the time the DEBUG message has
been emitted).

> Note that the reason why
> walsender has picked the running_xact written by background writer is
> because it has checked after pg_current_wal_lsn() query, see LOGs [2].
> I think we can probably try to reproduce manually via debugger.
> 
> If this theory is correct

It think it is.

> then I think we will need to use injection
> points to control the behavior of bgwriter or use the slots created
> via SQL API for syncing in tests.
> 
> Thoughts?

I think that maybe as a first step we should move the "elog(DEBUG2," message as
proposed above to help debugging (that could help to confirm the above theory).

If the theory is proven then I'm not sure we need the extra complexity of
injection point here, maybe just relying on the slots created via SQL API could
be enough.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

05 April 2024, 15:16:12

Hi,

On Fri, Apr 05, 2024 at 02:35:42PM +0000, Bertrand Drouvot wrote:
> I think that maybe as a first step we should move the "elog(DEBUG2," message as
> proposed above to help debugging (that could help to confirm the above theory).

If you agree and think that makes sense, pleae find attached a tiny patch doing
so.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v1-0001-Move-DEBUG-message-in-LogCurrentRunningXacts.patch

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

06 April 2024, 04:43:00

On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote:
> > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Thinking more on this, it doesn't seem related to
> > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't change
> > any locking or something like that which impacts write positions.
>
> Agree.
>
> > I think what has happened here is that running_xact record written by
> > the background writer [1] is not written to the kernel or disk (see
> > LogStandbySnapshot()), before pg_current_wal_lsn() checks the
> > current_lsn to be compared with replayed LSN.
>
> Agree, I think it's not visible through pg_current_wal_lsn() yet.
>
> Also I think that the DEBUG message in LogCurrentRunningXacts()
>
> "
>         elog(DEBUG2,
>              "snapshot of %d+%d running transaction ids (lsn %X/%X oldest xid %u latest complete %u next xid %u)",
>              CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt,
>              LSN_FORMAT_ARGS(recptr),
>              CurrRunningXacts->oldestRunningXid,
>              CurrRunningXacts->latestCompletedXid,
>              CurrRunningXacts->nextXid);
> "
>
> should be located after the XLogSetAsyncXactLSN() call. Indeed, the new LSN is
> visible after the spinlock (XLogCtl->info_lck) in XLogSetAsyncXactLSN() is
> released,
>

I think the new LSN can be visible only when the corresponding WAL is
written by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can
make it visible. In your experiment below, isn't it possible that in
the meantime WAL writer has written that WAL due to which you are
seeing an updated location?

>see:
>
> \watch on Session 1 provides:
>
>  pg_current_wal_lsn
> --------------------
>  0/87D110
> (1 row)
>
> Until:
>
> Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at xlog.c:2579
> 2579            XLogRecPtr      WriteRqstPtr = asyncXactLSN;
> (gdb) n
> 2581            bool            wakeup = false;
> (gdb)
> 2584            SpinLockAcquire(&XLogCtl->info_lck);
> (gdb)
> 2585            RefreshXLogWriteResult(LogwrtResult);
> (gdb)
> 2586            sleeping = XLogCtl->WalWriterSleeping;
> (gdb)
> 2587            prevAsyncXactLSN = XLogCtl->asyncXactLSN;
> (gdb)
> 2588            if (XLogCtl->asyncXactLSN < asyncXactLSN)
> (gdb)
> 2589                    XLogCtl->asyncXactLSN = asyncXactLSN;
> (gdb)
> 2590            SpinLockRelease(&XLogCtl->info_lck);
> (gdb) p p/x (uint32) XLogCtl->asyncXactLSN
> $1 = 0x87d148
>
> Then session 1 provides:
>
>  pg_current_wal_lsn
> --------------------
>  0/87D148
> (1 row)
>
> So, when we see in the log:
>
> 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:  snapshot of 0+0 running transaction ids (lsn
0/3000098oldest xid 740 latest complete 739 next xid 740) 
> 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG:  statement: SELECT '0/3000060' <= replay_lsn AND
state= 'streaming' 
>
> It's indeed possible that the new LSN was not visible yet (spinlock not released?)
> before the query began (because we can not rely on the time the DEBUG message has
> been emitted).
>
> > Note that the reason why
> > walsender has picked the running_xact written by background writer is
> > because it has checked after pg_current_wal_lsn() query, see LOGs [2].
> > I think we can probably try to reproduce manually via debugger.
> >
> > If this theory is correct
>
> It think it is.
>
> > then I think we will need to use injection
> > points to control the behavior of bgwriter or use the slots created
> > via SQL API for syncing in tests.
> >
> > Thoughts?
>
> I think that maybe as a first step we should move the "elog(DEBUG2," message as
> proposed above to help debugging (that could help to confirm the above theory).
>

I think I am missing how exactly moving DEBUG2 can confirm the above theory.

> If the theory is proven then I'm not sure we need the extra complexity of
> injection point here, maybe just relying on the slots created via SQL API could
> be enough.
>

Yeah, that could be the first step. We can probably add an injection
point to control the bgwrite behavior and then add tests involving
walsender performing the decoding. But I think it is important to have
sufficient tests in this area as I see they are quite helpful in
uncovering the issues.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

06 April 2024, 05:28:32

On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>

There are still a few pending issues to be fixed in this feature but
otherwise, we have committed all the main patches, so I marked the CF
entry corresponding to this work as committed.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

06 April 2024, 16:47:31

Hi,

On Sat, Apr 06, 2024 at 10:13:00AM +0530, Amit Kapila wrote:
> On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> 
> I think the new LSN can be visible only when the corresponding WAL is
> written by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can
> make it visible. In your experiment below, isn't it possible that in
> the meantime WAL writer has written that WAL due to which you are
> seeing an updated location?

What I did is:

session 1:  select pg_current_wal_lsn();\watch 1
session 2:  select pg_backend_pid();

terminal 1: tail -f logfile | grep -i snap 
terminal 2 : gdb -p <backendpid session 2) -ex 'b LogCurrentRunningXacts' + "continue" once in gdb

session 2: SELECT pg_log_standby_snapshot();

That produces a break in the gdb session, then:

Breakpoint 1, LogCurrentRunningXacts (CurrRunningXacts=0x5774f92f8da0 <CurrentRunningXactsData.13>) at standby.c:1346
1346    {
(gdb) n
1350

Then next, next until the DEBUG message is emitted (confirmed in terminal 1).

At this stage the DEBUG message shows the new LSN while session 1 still displays
the previous LSN.

Then once XLogSetAsyncXactLSN() is done in the gdb session (terminal 2) then
session 1 displays the new LSN.

This is reproducible as desired.

With more debugging I can see that when the spinlock is released in XLogSetAsyncXactLSN()
then XLogWrite() is doing its job and then session 1 does see the new value (that
happens in this order, and as you said that's expected).

My point is that while the DEBUG message is emitted session 1 still 
see the old LSN (until the new LSN is vsible). I think that we should emit the
DEBUG message once session 1 can see the new value (If not, I think the timestamp
of the DEBUG message can be missleading during debugging purpose).

> I think I am missing how exactly moving DEBUG2 can confirm the above theory.

I meant to say that instead of seeing:

2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:  snapshot of 0+0 running transaction ids (lsn
0/3000098oldest xid 740 latest complete 739 next xid 740)

2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG:  statement: SELECT '0/3000060' <= replay_lsn AND
state= 'streaming'

We would probably see something like:

2024-04-05 04:37:05.<something> UTC [3866475][client backend][2/4:0] LOG:  statement: SELECT '0/3000060' <= replay_lsn
ANDstate = 'streaming'

2024-04-05 04:37:05.<something>+xx UTC [3854278][background writer][:0] DEBUG:  snapshot of 0+0 running transaction ids
(lsn0/3000098 oldest xid 740 latest complete 739 next xid 740)

And then it would be clear that the query has ran before the new LSN is visible.

> > If the theory is proven then I'm not sure we need the extra complexity of
> > injection point here, maybe just relying on the slots created via SQL API could
> > be enough.
> >
> 
> Yeah, that could be the first step. We can probably add an injection
> point to control the bgwrite behavior and then add tests involving
> walsender performing the decoding. But I think it is important to have
> sufficient tests in this area as I see they are quite helpful in
> uncovering the issues.
>

Yeah agree. 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Andres Freund

Date:

06 April 2024, 21:36:16

Hi,

On 2024-04-06 10:58:32 +0530, Amit Kapila wrote:
> On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> 
> There are still a few pending issues to be fixed in this feature but
> otherwise, we have committed all the main patches, so I marked the CF
> entry corresponding to this work as committed.

There are a a fair number of failures of 040_standby_failover_slots_sync in
the buildfarm.  It'd be nice to get those fixed soon-ish.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2024-04-06%2020%3A58%3A50
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-04-06%2015%3A18%3A08
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo&dt=2024-04-06%2010%3A13%3A58
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2024-04-05%2016%3A04%3A10
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo&dt=2024-04-05%2014%3A59%3A40
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-04-05%2014%3A59%3A07
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2024-04-05%2014%3A18%3A07

The symptoms are similar, but not entirely identical across all of them, I think.

I've also seen a bunch of failures of this test locally.

Greetings,

Andres Freund

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

08 April 2024, 04:54:13

On Sun, Apr 7, 2024 at 3:06 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2024-04-06 10:58:32 +0530, Amit Kapila wrote:
> > On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> >
> > There are still a few pending issues to be fixed in this feature but
> > otherwise, we have committed all the main patches, so I marked the CF
> > entry corresponding to this work as committed.
>
> There are a a fair number of failures of 040_standby_failover_slots_sync in
> the buildfarm.  It'd be nice to get those fixed soon-ish.
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2024-04-06%2020%3A58%3A50
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-04-06%2015%3A18%3A08
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo&dt=2024-04-06%2010%3A13%3A58
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2024-04-05%2016%3A04%3A10
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo&dt=2024-04-05%2014%3A59%3A40
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-04-05%2014%3A59%3A07
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2024-04-05%2014%3A18%3A07
>
> The symptoms are similar, but not entirely identical across all of them, I think.
>

I have analyzed these failures and there are two different tests that
are failing but the underlying reason is the same as being discussed
with Bertrand. We are working on the fix.

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

08 April 2024, 06:48:57

On Saturday, April 6, 2024 12:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote:
> > > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > > Thinking more on this, it doesn't seem related to
> > > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't
> > > change any locking or something like that which impacts write positions.
> >
> > Agree.
> >
> > > I think what has happened here is that running_xact record written
> > > by the background writer [1] is not written to the kernel or disk
> > > (see LogStandbySnapshot()), before pg_current_wal_lsn() checks the
> > > current_lsn to be compared with replayed LSN.
> >
> > Agree, I think it's not visible through pg_current_wal_lsn() yet.
> >
> > Also I think that the DEBUG message in LogCurrentRunningXacts()
> >
> > "
> >         elog(DEBUG2,
> >              "snapshot of %d+%d running transaction ids (lsn %X/%X oldest
> xid %u latest complete %u next xid %u)",
> >              CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt,
> >              LSN_FORMAT_ARGS(recptr),
> >              CurrRunningXacts->oldestRunningXid,
> >              CurrRunningXacts->latestCompletedXid,
> >              CurrRunningXacts->nextXid); "
> >
> > should be located after the XLogSetAsyncXactLSN() call. Indeed, the
> > new LSN is visible after the spinlock (XLogCtl->info_lck) in
> > XLogSetAsyncXactLSN() is released,
> >
> 
> I think the new LSN can be visible only when the corresponding WAL is written
> by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can make it
> visible. In your experiment below, isn't it possible that in the meantime WAL
> writer has written that WAL due to which you are seeing an updated location?
> 
> >see:
> >
> > \watch on Session 1 provides:
> >
> >  pg_current_wal_lsn
> > --------------------
> >  0/87D110
> > (1 row)
> >
> > Until:
> >
> > Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at
> xlog.c:2579
> > 2579            XLogRecPtr      WriteRqstPtr = asyncXactLSN;
> > (gdb) n
> > 2581            bool            wakeup = false;
> > (gdb)
> > 2584            SpinLockAcquire(&XLogCtl->info_lck);
> > (gdb)
> > 2585            RefreshXLogWriteResult(LogwrtResult);
> > (gdb)
> > 2586            sleeping = XLogCtl->WalWriterSleeping;
> > (gdb)
> > 2587            prevAsyncXactLSN = XLogCtl->asyncXactLSN;
> > (gdb)
> > 2588            if (XLogCtl->asyncXactLSN < asyncXactLSN)
> > (gdb)
> > 2589                    XLogCtl->asyncXactLSN = asyncXactLSN;
> > (gdb)
> > 2590            SpinLockRelease(&XLogCtl->info_lck);
> > (gdb) p p/x (uint32) XLogCtl->asyncXactLSN
> > $1 = 0x87d148
> >
> > Then session 1 provides:
> >
> >  pg_current_wal_lsn
> > --------------------
> >  0/87D148
> > (1 row)
> >
> > So, when we see in the log:
> >
> > 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG:
> > snapshot of 0+0 running transaction ids (lsn 0/3000098 oldest xid 740
> > latest complete 739 next xid 740)
> > 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG:
> statement: SELECT '0/3000060' <= replay_lsn AND state = 'streaming'
> >
> > It's indeed possible that the new LSN was not visible yet (spinlock
> > not released?) before the query began (because we can not rely on the
> > time the DEBUG message has been emitted).
> >
> > > Note that the reason why
> > > walsender has picked the running_xact written by background writer
> > > is because it has checked after pg_current_wal_lsn() query, see LOGs [2].
> > > I think we can probably try to reproduce manually via debugger.
> > >
> > > If this theory is correct
> >
> > It think it is.
> >
> > > then I think we will need to use injection points to control the
> > > behavior of bgwriter or use the slots created via SQL API for
> > > syncing in tests.
> > >
> > > Thoughts?
> >
> > I think that maybe as a first step we should move the "elog(DEBUG2,"
> > message as proposed above to help debugging (that could help to confirm
> the above theory).
> >
> 
> I think I am missing how exactly moving DEBUG2 can confirm the above theory.
> 
> > If the theory is proven then I'm not sure we need the extra complexity
> > of injection point here, maybe just relying on the slots created via
> > SQL API could be enough.
> >
> 
> Yeah, that could be the first step. We can probably add an injection point to
> control the bgwrite behavior and then add tests involving walsender
> performing the decoding. But I think it is important to have sufficient tests in
> this area as I see they are quite helpful in uncovering the issues.

Here is the patch to drop the subscription in the beginning so that the
restart_lsn of the lsub1_slot won't be advanced due to concurrent
xl_running_xacts from bgwriter. The subscription will be re-created after all
the slots are sync-ready. I think maybe we can use this to stabilize the test
as a first step and then think about how to make use of injection point to add
more tests if it's worth it.

Best Regards,
Hou zj

Attachment

v1-0001-stablize-the-test.patch

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

08 April 2024, 10:31:41

On Mon, Apr 8, 2024 at 12:19 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Saturday, April 6, 2024 12:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Yeah, that could be the first step. We can probably add an injection point to
> > control the bgwrite behavior and then add tests involving walsender
> > performing the decoding. But I think it is important to have sufficient tests in
> > this area as I see they are quite helpful in uncovering the issues.
>
> Here is the patch to drop the subscription in the beginning so that the
> restart_lsn of the lsub1_slot won't be advanced due to concurrent
> xl_running_xacts from bgwriter. The subscription will be re-created after all
> the slots are sync-ready. I think maybe we can use this to stabilize the test
> as a first step and then think about how to make use of injection point to add
> more tests if it's worth it.
>

Pushed.

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

08 April 2024, 13:31:22

On Monday, April 8, 2024 6:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Mon, Apr 8, 2024 at 12:19 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > On Saturday, April 6, 2024 12:43 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > Yeah, that could be the first step. We can probably add an injection
> > > point to control the bgwrite behavior and then add tests involving
> > > walsender performing the decoding. But I think it is important to
> > > have sufficient tests in this area as I see they are quite helpful in uncovering
> the issues.
> >
> > Here is the patch to drop the subscription in the beginning so that
> > the restart_lsn of the lsub1_slot won't be advanced due to concurrent
> > xl_running_xacts from bgwriter. The subscription will be re-created
> > after all the slots are sync-ready. I think maybe we can use this to
> > stabilize the test as a first step and then think about how to make
> > use of injection point to add more tests if it's worth it.
> >
> 
> Pushed.

Thanks for pushing.

I checked the BF status, and noticed one BF failure, which I think is related to
a miss in the test code.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-04-08%2012%3A04%3A27

From the following log, I can see the sync failed because the standby is
lagging behind of the failover slot.

-----
# No postmaster PID for node "cascading_standby"
error running SQL: 'psql:<stdin>:1: ERROR:  skipping slot synchronization as the received slot sync LSN 0/4000148 for
slot"snap_test_slot" is ahead of the standby position 0/4000114'
 
while running 'psql -XAtq -d port=50074 host=/tmp/t4HQFlrDmI dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql
'SELECTpg_sync_replication_slots();' at /home/bf/bf-build/adder/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm
line2042.
 
# Postmaster PID for node "publisher" is 3715298
-----

I think it's because we missed to call wait_for_replay_catchup before syncing
slots.

-----
$primary->safe_psql('postgres',
    "SELECT pg_create_logical_replication_slot('snap_test_slot', 'test_decoding', false, false, true);"
);
# ? missed to wait here
$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
-----

While testing, I noticed another place where we were calling
wait_for_replay_catchup before doing pg_replication_slot_advance, which also has
a small possibility to cause the failover slot to be ahead of the standby if
some logs are written in between these two steps. So, I adjusted them together.

Here is a small patch to improve the test.

Best Regards,
Hou zj

Attachment

0001-Ensure-the-standby-is-not-lagging-behind-the-failove.patch

Re: Synchronizing slots from primary to standby

From

Andres Freund

Date:

08 April 2024, 16:19:26

Hi,

On 2024-04-08 16:01:41 +0530, Amit Kapila wrote:
> Pushed.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-04-08%2012%3A04%3A27

This unfortunately is a commit after

commit 6f3d8d5e7cc
Author: Amit Kapila <akapila@postgresql.org>
Date:   2024-04-08 13:21:55 +0530

    Fix the intermittent buildfarm failures in 040_standby_failover_slots_sync.


Greetings,

Andres

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

09 April 2024, 00:29:55

On Mon, Apr 8, 2024 at 9:49 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2024-04-08 16:01:41 +0530, Amit Kapila wrote:
> > Pushed.
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-04-08%2012%3A04%3A27
>
> This unfortunately is a commit after
>

Right, and thanks for the report. Hou-San has analyzed and shared the
patch [1] for this yesterday. I'll review it today.

[1] -
https://www.postgresql.org/message-id/OS0PR01MB571665359F2F5DCD3ADABC9F94002%40OS0PR01MB5716.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

09 April 2024, 05:13:18

On Mon, Apr 8, 2024 at 7:01 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Thanks for pushing.
>
> I checked the BF status, and noticed one BF failure, which I think is related to
> a miss in the test code.
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-04-08%2012%3A04%3A27
>
> From the following log, I can see the sync failed because the standby is
> lagging behind of the failover slot.
>
> -----
> # No postmaster PID for node "cascading_standby"
> error running SQL: 'psql:<stdin>:1: ERROR:  skipping slot synchronization as the received slot sync LSN 0/4000148 for
slot"snap_test_slot" is ahead of the standby position 0/4000114' 
> while running 'psql -XAtq -d port=50074 host=/tmp/t4HQFlrDmI dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql
'SELECTpg_sync_replication_slots();' at /home/bf/bf-build/adder/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm
line2042. 
> # Postmaster PID for node "publisher" is 3715298
> -----
>
> I think it's because we missed to call wait_for_replay_catchup before syncing
> slots.
>
> -----
> $primary->safe_psql('postgres',
>         "SELECT pg_create_logical_replication_slot('snap_test_slot', 'test_decoding', false, false, true);"
> );
> # ? missed to wait here
> $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
> -----
>
> While testing, I noticed another place where we were calling
> wait_for_replay_catchup before doing pg_replication_slot_advance, which also has
> a small possibility to cause the failover slot to be ahead of the standby if
> some logs are written in between these two steps. So, I adjusted them together.
>
> Here is a small patch to improve the test.
>

LGTM. I'll push this tomorrow morning unless there are any more
comments or suggestions.

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

09 April 2024, 13:00:49

On Thursday, April 4, 2024 4:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi,

> On Wed, Apr 3, 2024 at 7:06 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
> > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > > I quickly looked at v8, and have a nit, rest all looks good.
> > > >
> > > > +        if (DecodingContextReady(ctx) &&
> found_consistent_snapshot)
> > > > +            *found_consistent_snapshot = true;
> > > >
> > > > Can the found_consistent_snapshot be checked first to help avoid
> > > > the function call DecodingContextReady() for
> > > > pg_replication_slot_advance callers?
> > > >
> > >
> > > Okay, changed. Additionally, I have updated the comments and commit
> > > message. I'll push this patch after some more testing.
> > >
> >
> > Pushed!
> 
> While testing this change, I realized that it could happen that the server logs
> are flooded with the following logical decoding logs that are written every 200
> ms:

Thanks for reporting!

> 
> 2024-04-04 16:15:19.270 JST [3838739] LOG:  starting logical decoding for slot
> "test_sub"
> 2024-04-04 16:15:19.270 JST [3838739] DETAIL:  Streaming transactions
> committing after 0/50006F48, reading WAL from 0/50006F10.
> 2024-04-04 16:15:19.270 JST [3838739] LOG:  logical decoding found
> consistent point at 0/50006F10
> 2024-04-04 16:15:19.270 JST [3838739] DETAIL:  There are no running
> transactions.
> 2024-04-04 16:15:19.477 JST [3838739] LOG:  starting logical decoding for slot
> "test_sub"
> 2024-04-04 16:15:19.477 JST [3838739] DETAIL:  Streaming transactions
> committing after 0/50006F48, reading WAL from 0/50006F10.
> 2024-04-04 16:15:19.477 JST [3838739] LOG:  logical decoding found
> consistent point at 0/50006F10
> 2024-04-04 16:15:19.477 JST [3838739] DETAIL:  There are no running
> transactions.
> 
> For example, I could reproduce it with the following steps:
> 
> 1. create the primary and start.
> 2. run "pgbench -i -s 100" on the primary.
> 3. run pg_basebackup to create the standby.
> 4. configure slotsync setup on the standby and start.
> 5. create a publication for all tables on the primary.
> 6. create the subscriber and start.
> 7. run "pgbench -i -Idtpf" on the subscriber.
> 8. create a subscription on the subscriber (initial data copy will start).
> 
> The logical decoding logs were written every 200 ms during the initial data
> synchronization.
> 
> Looking at the new changes for update_local_synced_slot():
...
> We call LogicalSlotAdvanceAndCheckSnapState() if one of confirmed_lsn,
> restart_lsn, and catalog_xmin is different between the remote slot and the local
> slot. In my test case, during the initial sync performing, only catalog_xmin was
> different and there was no serialized snapshot at restart_lsn, and the slotsync
> worker called LogicalSlotAdvanceAndCheckSnapState(). However no slot
> properties were changed even after the function and it set slot_updated = true.
> So it starts the next slot synchronization after 200ms.

I was trying to reproduce this and check why the catalog_xmin is different
among synced slot and remote slot, but I was not able to reproduce the case
where there are lots of logical decoding logs. The script I used is attached.

Would it be possible for you to share the script you used to reproduce this
issue? Alternatively, could you please share the log files from both the
primary and standby servers after reproducing the problem (it would be greatly
helpful if you could set the log level to DEBUG2).

Best Regards,
Hou zj

Attachment

test.sh

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

10 April 2024, 11:58:37

On Thursday, April 4, 2024 5:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> BTW, while thinking on this one, I
> noticed that in the function LogicalConfirmReceivedLocation(), we first update
> the disk copy, see comment [1] and then in-memory whereas the same is not
> true in
> update_local_synced_slot() for the case when snapshot exists. Now, do we have
> the same risk here in case of standby? Because I think we will use these xmins
> while sending the feedback message (in XLogWalRcvSendHSFeedback()).
>
> * We have to write the changed xmin to disk *before* we change
> * the in-memory value, otherwise after a crash we wouldn't know
> * that some catalog tuples might have been removed already.

Yes, I think we have the risk on the standby, I can reproduce the case that if
the server crashes after updating the in-memory value and before saving them to
disk, the synced slot could be invalidated after restarting from crash, because
the necessary rows have been removed on the primary. The steps can be found in
[1].

I think we'd better fix the order in update_local_synced_slot() as well. I
tried to make the fix in 0002, 0001 is Shveta's patch to fix another issue in this thread. Since
they are touching the same function, so attach them together for review.

[1]
-- Primary:
SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot', 'test_decoding', false, false, true);

-- Standby:
SELECT 'init' FROM pg_create_logical_replication_slot('standbylogicalslot', 'test_decoding', false, false, false);
SELECT pg_sync_replication_slots();

-- Primary:
CREATE TABLE test (a int);
INSERT INTO test VALUES(1);
DROP TABLE test;

SELECT txid_current();
SELECT txid_current();
SELECT txid_current();
SELECT pg_log_standby_snapshot();

SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn());

-- Standby:
- wait for standby to replay all the changes on the primary.

- this is to serialize snapshots.
SELECT pg_replication_slot_advance('standbylogicalslot', pg_last_wal_replay_lsn());

- Use gdb to stop at the place after calling ReplicationSlotsComputexx()
  functions and before calling ReplicationSlotSave().
SELECT pg_sync_replication_slots();

-- Primary:

- First, wait for the primary slot(the physical slot)'s catalog xmin to be
  updated to the same as the failover slot.

VACUUM FULL;

- Wait for VACUMM FULL to be replayed on standby.

-- Standby:

- For the process which is blocked by gdb, let the process crash (elog(PANIC,
  ...)).

After restarting the standby from crash, we can see the synced slot is invalidated.

LOG:  invalidating obsolete replication slot "logicalslot"
DETAIL:  The slot conflicted with xid horizon 741.
CONTEXT:  WAL redo at 0/3059B90 for Heap2/PRUNE_ON_ACCESS: snapshotConflictHorizon: 741, isCatalogRel: T, nplans: 0,
nredirected:0, ndead: 7, nunused: 0, dead: [22, 23, 24, 25, 26, 27, 28]; blkref #0: rel 1663/5/1249, blk 16
 


Best Regards,
Hou zj

Attachment

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

11 April 2024, 04:11:04

On Wed, Apr 10, 2024 at 5:28 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Thursday, April 4, 2024 5:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > BTW, while thinking on this one, I
> > noticed that in the function LogicalConfirmReceivedLocation(), we first update
> > the disk copy, see comment [1] and then in-memory whereas the same is not
> > true in
> > update_local_synced_slot() for the case when snapshot exists. Now, do we have
> > the same risk here in case of standby? Because I think we will use these xmins
> > while sending the feedback message (in XLogWalRcvSendHSFeedback()).
> >
> > * We have to write the changed xmin to disk *before* we change
> > * the in-memory value, otherwise after a crash we wouldn't know
> > * that some catalog tuples might have been removed already.
>
> Yes, I think we have the risk on the standby, I can reproduce the case that if
> the server crashes after updating the in-memory value and before saving them to
> disk, the synced slot could be invalidated after restarting from crash, because
> the necessary rows have been removed on the primary. The steps can be found in
> [1].
>
> I think we'd better fix the order in update_local_synced_slot() as well. I
> tried to make the fix in 0002, 0001 is Shveta's patch to fix another issue in this thread. Since
> they are touching the same function, so attach them together for review.
>

Few comments:
===============
1.
+
+ /* Sanity check */
+ if (slot->data.confirmed_flush != remote_slot->confirmed_lsn)
+ ereport(LOG,
+ errmsg("synchronized confirmed_flush for slot \"%s\" differs from
remote slot",
+    remote_slot->name),

Is there a reason to use elevel as LOG instead of ERROR? I think it
should be elog(ERROR, .. as this is an unexpected case.

2.
- if (remote_slot->restart_lsn < slot->data.restart_lsn)
+ if (remote_slot->confirmed_lsn < slot->data.confirmed_flush)
  elog(ERROR,
  "cannot synchronize local slot \"%s\" LSN(%X/%X)"

Can we be more specific in this message? How about splitting it into
error_message as "cannot synchronize local slot \"%s\"" and then
errdetail as "Local slot's start streaming location LSN(%X/%X) is
ahead of remote slot's LSN(%X/%X)"?

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

11 April 2024, 11:33:55

On Thursday, April 11, 2024 12:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Wed, Apr 10, 2024 at 5:28 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > On Thursday, April 4, 2024 5:37 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> > > BTW, while thinking on this one, I
> > > noticed that in the function LogicalConfirmReceivedLocation(), we
> > > first update the disk copy, see comment [1] and then in-memory
> > > whereas the same is not true in
> > > update_local_synced_slot() for the case when snapshot exists. Now,
> > > do we have the same risk here in case of standby? Because I think we
> > > will use these xmins while sending the feedback message (in
> XLogWalRcvSendHSFeedback()).
> > >
> > > * We have to write the changed xmin to disk *before* we change
> > > * the in-memory value, otherwise after a crash we wouldn't know
> > > * that some catalog tuples might have been removed already.
> >
> > Yes, I think we have the risk on the standby, I can reproduce the case
> > that if the server crashes after updating the in-memory value and
> > before saving them to disk, the synced slot could be invalidated after
> > restarting from crash, because the necessary rows have been removed on
> > the primary. The steps can be found in [1].
> >
> > I think we'd better fix the order in update_local_synced_slot() as
> > well. I tried to make the fix in 0002, 0001 is Shveta's patch to fix
> > another issue in this thread. Since they are touching the same function, so
> attach them together for review.
> >
> 
> Few comments:
> ===============
> 1.
> +
> + /* Sanity check */
> + if (slot->data.confirmed_flush != remote_slot->confirmed_lsn)
> + ereport(LOG, errmsg("synchronized confirmed_flush for slot \"%s\"
> + differs from
> remote slot",
> +    remote_slot->name),
> 
> Is there a reason to use elevel as LOG instead of ERROR? I think it should be
> elog(ERROR, .. as this is an unexpected case.

Agreed.

> 
> 2.
> - if (remote_slot->restart_lsn < slot->data.restart_lsn)
> + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush)
>   elog(ERROR,
>   "cannot synchronize local slot \"%s\" LSN(%X/%X)"
> 
> Can we be more specific in this message? How about splitting it into
> error_message as "cannot synchronize local slot \"%s\"" and then errdetail as
> "Local slot's start streaming location LSN(%X/%X) is ahead of remote slot's
> LSN(%X/%X)"?

Your version looks better. Since the above two messages all have errdetail, I
used the style of ereport(ERROR, errmsg_internal(), errdetail_internal()... in
the patch which is equal to the elog(ERROR but has an additional detail message.

Here is V5 patch set.

Best Regards,
Hou zj

Attachment

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

12 April 2024, 03:30:58

On Thu, Apr 11, 2024 at 5:04 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Thursday, April 11, 2024 12:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >
> > 2.
> > - if (remote_slot->restart_lsn < slot->data.restart_lsn)
> > + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush)
> >   elog(ERROR,
> >   "cannot synchronize local slot \"%s\" LSN(%X/%X)"
> >
> > Can we be more specific in this message? How about splitting it into
> > error_message as "cannot synchronize local slot \"%s\"" and then errdetail as
> > "Local slot's start streaming location LSN(%X/%X) is ahead of remote slot's
> > LSN(%X/%X)"?
>
> Your version looks better. Since the above two messages all have errdetail, I
> used the style of ereport(ERROR, errmsg_internal(), errdetail_internal()... in
> the patch which is equal to the elog(ERROR but has an additional detail message.
>

makes sense.

> Here is V5 patch set.
>

I think we should move the check to not advance slot when one of
remote_slot's restart_lsn or catalog_xmin is lesser than the local
slot inside update_local_synced_slot() as we want to prevent updating
slot in those cases even during slot synchronization.

--
With Regards,
Amit Kapila.

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

12 April 2024, 07:29:13

On Friday, April 12, 2024 11:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Thu, Apr 11, 2024 at 5:04 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > On Thursday, April 11, 2024 12:11 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > >
> > > 2.
> > > - if (remote_slot->restart_lsn < slot->data.restart_lsn)
> > > + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush)
> > >   elog(ERROR,
> > >   "cannot synchronize local slot \"%s\" LSN(%X/%X)"
> > >
> > > Can we be more specific in this message? How about splitting it into
> > > error_message as "cannot synchronize local slot \"%s\"" and then
> > > errdetail as "Local slot's start streaming location LSN(%X/%X) is
> > > ahead of remote slot's LSN(%X/%X)"?
> >
> > Your version looks better. Since the above two messages all have
> > errdetail, I used the style of ereport(ERROR, errmsg_internal(),
> > errdetail_internal()... in the patch which is equal to the elog(ERROR but has an
> additional detail message.
> >
> 
> makes sense.
> 
> > Here is V5 patch set.
> >
> 
> I think we should move the check to not advance slot when one of
> remote_slot's restart_lsn or catalog_xmin is lesser than the local slot inside
> update_local_synced_slot() as we want to prevent updating slot in those cases
> even during slot synchronization.

Agreed. Here is the V6 patch which addressed this. I have merged the
two patches into one.

Best Regards,
Hou zj

Attachment

v6-0001-Fix-the-handling-of-LSN-and-xmin-in-slot-sync.patch

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

29 April 2024, 05:27:13

On Friday, March 15, 2024 10:45 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Thu, Mar 14, 2024 at 02:22:44AM +0000, Zhijie Hou (Fujitsu) wrote:
> > Hi,
> >
> > Since the standby_slot_names patch has been committed, I am attaching
> > the last doc patch for review.
> >
>
> Thanks!
>
> 1 ===
>
> +   continue subscribing to publications now on the new primary server
> without
> +   any data loss.
>
> I think "without any data loss" should be re-worded in this context. Data loss in
> the sense "data committed on the primary and not visible on the subscriber in
> case of failover" can still occurs (in case synchronous replication is not used).
>
> 2 ===
>
> +   If the result (<literal>failover_ready</literal>) of both above steps is
> +   true, existing subscriptions will be able to continue without data loss.
> +  </para>
>
> I don't think that's true if synchronous replication is not used. Say,
>
> - synchronous replication is not used
> - primary is not able to reach the standby anymore and standby_slot_names is
> set
> - new data is inserted into the primary
> - then not replicated to subscriber (due to standby_slot_names)
>
> Then I think the both above steps will return true but data would be lost in case
> of failover.

Thanks for the comments, attach the new version patch which reworded the
above places.

Best Regards,
Hou zj

Attachment

v2-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

29 April 2024, 06:08:14

On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Friday, March 15, 2024 10:45 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Thu, Mar 14, 2024 at 02:22:44AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > Hi,
> > >
> > > Since the standby_slot_names patch has been committed, I am attaching
> > > the last doc patch for review.
> > >
> >
> > Thanks!
> >
> > 1 ===
> >
> > +   continue subscribing to publications now on the new primary server
> > without
> > +   any data loss.
> >
> > I think "without any data loss" should be re-worded in this context. Data loss in
> > the sense "data committed on the primary and not visible on the subscriber in
> > case of failover" can still occurs (in case synchronous replication is not used).
> >
> > 2 ===
> >
> > +   If the result (<literal>failover_ready</literal>) of both above steps is
> > +   true, existing subscriptions will be able to continue without data loss.
> > +  </para>
> >
> > I don't think that's true if synchronous replication is not used. Say,
> >
> > - synchronous replication is not used
> > - primary is not able to reach the standby anymore and standby_slot_names is
> > set
> > - new data is inserted into the primary
> > - then not replicated to subscriber (due to standby_slot_names)
> >
> > Then I think the both above steps will return true but data would be lost in case
> > of failover.
>
> Thanks for the comments, attach the new version patch which reworded the
> above places.

Thanks for the patch.

Few comments:

1)  Tested the steps, one of the queries still refers to
'conflict_reason'. I think it should refer 'conflicting'.

2) Will it be good to mention that in case of planned promotion, it is
recommended to run  pg_sync_replication_slots() as last sync attempt
before we run failvoer-ready validation steps? This can be mentioned
in high-availaibility.sgml of current patch

thanks
Shveta

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

29 April 2024, 09:08:48

On Mon, Apr 29, 2024 at 11:38 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Mar 14, 2024 at 02:22:44AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > > Hi,
> > > >
> > > > Since the standby_slot_names patch has been committed, I am attaching
> > > > the last doc patch for review.
> > > >
> > >
> > > Thanks!
> > >
> > > 1 ===
> > >
> > > +   continue subscribing to publications now on the new primary server
> > > without
> > > +   any data loss.
> > >
> > > I think "without any data loss" should be re-worded in this context. Data loss in
> > > the sense "data committed on the primary and not visible on the subscriber in
> > > case of failover" can still occurs (in case synchronous replication is not used).
> > >
> > > 2 ===
> > >
> > > +   If the result (<literal>failover_ready</literal>) of both above steps is
> > > +   true, existing subscriptions will be able to continue without data loss.
> > > +  </para>
> > >
> > > I don't think that's true if synchronous replication is not used. Say,
> > >
> > > - synchronous replication is not used
> > > - primary is not able to reach the standby anymore and standby_slot_names is
> > > set
> > > - new data is inserted into the primary
> > > - then not replicated to subscriber (due to standby_slot_names)
> > >
> > > Then I think the both above steps will return true but data would be lost in case
> > > of failover.
> >
> > Thanks for the comments, attach the new version patch which reworded the
> > above places.
>
> Thanks for the patch.
>
> Few comments:
>
> 1)  Tested the steps, one of the queries still refers to
> 'conflict_reason'. I think it should refer 'conflicting'.
>
> 2) Will it be good to mention that in case of planned promotion, it is
> recommended to run  pg_sync_replication_slots() as last sync attempt
> before we run failvoer-ready validation steps? This can be mentioned
> in high-availaibility.sgml of current patch

I recall now that with the latest fix, we cannot run
pg_sync_replication_slots() unless we disable the slot-sync worker.
Considering that, I think it will be too many steps just to run the
SQL function at the end without much value added. Thus we can skip
this point, we can rely on slot sync worker completely.

thanks
Shveta

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

29 April 2024, 09:10:31

On Mon, Apr 29, 2024 at 11:38 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Mar 14, 2024 at 02:22:44AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > > Hi,
> > > >
> > > > Since the standby_slot_names patch has been committed, I am attaching
> > > > the last doc patch for review.
> > > >
> > >
> > > Thanks!
> > >
> > > 1 ===
> > >
> > > +   continue subscribing to publications now on the new primary server
> > > without
> > > +   any data loss.
> > >
> > > I think "without any data loss" should be re-worded in this context. Data loss in
> > > the sense "data committed on the primary and not visible on the subscriber in
> > > case of failover" can still occurs (in case synchronous replication is not used).
> > >
> > > 2 ===
> > >
> > > +   If the result (<literal>failover_ready</literal>) of both above steps is
> > > +   true, existing subscriptions will be able to continue without data loss.
> > > +  </para>
> > >
> > > I don't think that's true if synchronous replication is not used. Say,
> > >
> > > - synchronous replication is not used
> > > - primary is not able to reach the standby anymore and standby_slot_names is
> > > set
> > > - new data is inserted into the primary
> > > - then not replicated to subscriber (due to standby_slot_names)
> > >
> > > Then I think the both above steps will return true but data would be lost in case
> > > of failover.
> >
> > Thanks for the comments, attach the new version patch which reworded the
> > above places.
>
> Thanks for the patch.
>
> Few comments:
>
> 1)  Tested the steps, one of the queries still refers to
> 'conflict_reason'. I think it should refer 'conflicting'.
>
> 2) Will it be good to mention that in case of planned promotion, it is
> recommended to run  pg_sync_replication_slots() as last sync attempt
> before we run failvoer-ready validation steps? This can be mentioned
> in high-availaibility.sgml of current patch

I recall now that with the latest fix, we cannot run
pg_sync_replication_slots() unless we disable the slot-sync worker.
Considering that, I think it will be too many steps just to run the
SQL function at the end without much value added. Thus we can skip
this point, we can rely on slot sync worker completely.

thanks
Shveta

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

29 April 2024, 11:58:09

On Monday, April 29, 2024 5:11 PM shveta malik <shveta.malik@gmail.com> wrote:
> 
> On Mon, Apr 29, 2024 at 11:38 AM shveta malik <shveta.malik@gmail.com>
> wrote:
> >
> > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Thu, Mar 14, 2024 at 02:22:44AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > > > Hi,
> > > > >
> > > > > Since the standby_slot_names patch has been committed, I am
> > > > > attaching the last doc patch for review.
> > > > >
> > > >
> > > > Thanks!
> > > >
> > > > 1 ===
> > > >
> > > > +   continue subscribing to publications now on the new primary
> > > > + server
> > > > without
> > > > +   any data loss.
> > > >
> > > > I think "without any data loss" should be re-worded in this
> > > > context. Data loss in the sense "data committed on the primary and
> > > > not visible on the subscriber in case of failover" can still occurs (in case
> synchronous replication is not used).
> > > >
> > > > 2 ===
> > > >
> > > > +   If the result (<literal>failover_ready</literal>) of both above steps is
> > > > +   true, existing subscriptions will be able to continue without data
> loss.
> > > > +  </para>
> > > >
> > > > I don't think that's true if synchronous replication is not used.
> > > > Say,
> > > >
> > > > - synchronous replication is not used
> > > > - primary is not able to reach the standby anymore and
> > > > standby_slot_names is set
> > > > - new data is inserted into the primary
> > > > - then not replicated to subscriber (due to standby_slot_names)
> > > >
> > > > Then I think the both above steps will return true but data would
> > > > be lost in case of failover.
> > >
> > > Thanks for the comments, attach the new version patch which reworded
> > > the above places.
> >
> > Thanks for the patch.
> >
> > Few comments:
> >
> > 1)  Tested the steps, one of the queries still refers to
> > 'conflict_reason'. I think it should refer 'conflicting'.

Thanks for catching this. Fixed.

> >
> > 2) Will it be good to mention that in case of planned promotion, it is
> > recommended to run  pg_sync_replication_slots() as last sync attempt
> > before we run failvoer-ready validation steps? This can be mentioned
> > in high-availaibility.sgml of current patch
> 
> I recall now that with the latest fix, we cannot run
> pg_sync_replication_slots() unless we disable the slot-sync worker.
> Considering that, I think it will be too many steps just to run the SQL function at
> the end without much value added. Thus we can skip this point, we can rely on
> slot sync worker completely.

Agreed. I didn't change this.

Here is the V3 doc patch.

Best Regards,
Hou zj

Attachment

v3-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch

Re: Synchronizing slots from primary to standby

From

shveta malik

Date:

30 April 2024, 03:54:26

On Mon, Apr 29, 2024 at 5:28 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Monday, April 29, 2024 5:11 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Mon, Apr 29, 2024 at 11:38 AM shveta malik <shveta.malik@gmail.com>
> > wrote:
> > >
> > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Thu, Mar 14, 2024 at 02:22:44AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > > > > Hi,
> > > > > >
> > > > > > Since the standby_slot_names patch has been committed, I am
> > > > > > attaching the last doc patch for review.
> > > > > >
> > > > >
> > > > > Thanks!
> > > > >
> > > > > 1 ===
> > > > >
> > > > > +   continue subscribing to publications now on the new primary
> > > > > + server
> > > > > without
> > > > > +   any data loss.
> > > > >
> > > > > I think "without any data loss" should be re-worded in this
> > > > > context. Data loss in the sense "data committed on the primary and
> > > > > not visible on the subscriber in case of failover" can still occurs (in case
> > synchronous replication is not used).
> > > > >
> > > > > 2 ===
> > > > >
> > > > > +   If the result (<literal>failover_ready</literal>) of both above steps is
> > > > > +   true, existing subscriptions will be able to continue without data
> > loss.
> > > > > +  </para>
> > > > >
> > > > > I don't think that's true if synchronous replication is not used.
> > > > > Say,
> > > > >
> > > > > - synchronous replication is not used
> > > > > - primary is not able to reach the standby anymore and
> > > > > standby_slot_names is set
> > > > > - new data is inserted into the primary
> > > > > - then not replicated to subscriber (due to standby_slot_names)
> > > > >
> > > > > Then I think the both above steps will return true but data would
> > > > > be lost in case of failover.
> > > >
> > > > Thanks for the comments, attach the new version patch which reworded
> > > > the above places.
> > >
> > > Thanks for the patch.
> > >
> > > Few comments:
> > >
> > > 1)  Tested the steps, one of the queries still refers to
> > > 'conflict_reason'. I think it should refer 'conflicting'.
>
> Thanks for catching this. Fixed.
>
> > >
> > > 2) Will it be good to mention that in case of planned promotion, it is
> > > recommended to run  pg_sync_replication_slots() as last sync attempt
> > > before we run failvoer-ready validation steps? This can be mentioned
> > > in high-availaibility.sgml of current patch
> >
> > I recall now that with the latest fix, we cannot run
> > pg_sync_replication_slots() unless we disable the slot-sync worker.
> > Considering that, I think it will be too many steps just to run the SQL function at
> > the end without much value added. Thus we can skip this point, we can rely on
> > slot sync worker completely.
>
> Agreed. I didn't change this.
>
> Here is the V3 doc patch.

Thanks for the patch.

It will be good if 1a can produce quoted slot-names list as output,
which can be used directly in step 1b's query; otherwise, it is little
inconvenient to give input to 1b if the number of slots are huge. User
needs to manually quote each slot-name.

Other than this, the patch looks good to me.

thanks
Shveta

Re: Synchronizing slots from primary to standby

From

Bertrand Drouvot

Date:

08 May 2024, 09:20:53

Hi,

On Mon, Apr 29, 2024 at 11:58:09AM +0000, Zhijie Hou (Fujitsu) wrote:
> On Monday, April 29, 2024 5:11 PM shveta malik <shveta.malik@gmail.com> wrote:
> > 
> > On Mon, Apr 29, 2024 at 11:38 AM shveta malik <shveta.malik@gmail.com>
> > wrote:
> > >
> > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Thu, Mar 14, 2024 at 02:22:44AM +0000, Zhijie Hou (Fujitsu) wrote:
> > > > > > Hi,
> > > > > >
> > > > > > Since the standby_slot_names patch has been committed, I am
> > > > > > attaching the last doc patch for review.
> > > > > >
> > > > >
> > > > > Thanks!
> > > > >
> > > > > 1 ===
> > > > >
> > > > > +   continue subscribing to publications now on the new primary
> > > > > + server
> > > > > without
> > > > > +   any data loss.
> > > > >
> > > > > I think "without any data loss" should be re-worded in this
> > > > > context. Data loss in the sense "data committed on the primary and
> > > > > not visible on the subscriber in case of failover" can still occurs (in case
> > synchronous replication is not used).
> > > > >
> > > > > 2 ===
> > > > >
> > > > > +   If the result (<literal>failover_ready</literal>) of both above steps is
> > > > > +   true, existing subscriptions will be able to continue without data
> > loss.
> > > > > +  </para>
> > > > >
> > > > > I don't think that's true if synchronous replication is not used.
> > > > > Say,
> > > > >
> > > > > - synchronous replication is not used
> > > > > - primary is not able to reach the standby anymore and
> > > > > standby_slot_names is set
> > > > > - new data is inserted into the primary
> > > > > - then not replicated to subscriber (due to standby_slot_names)
> > > > >
> > > > > Then I think the both above steps will return true but data would
> > > > > be lost in case of failover.
> > > >
> > > > Thanks for the comments, attach the new version patch which reworded
> > > > the above places.

Thanks!

> Here is the V3 doc patch.

Thanks! A few comments:

1 ===

+   losing any data that has been flushed to the new primary server.

Worth to add a few words about possible data loss, something like?

Please note that in case synchronous replication is not used and standby_slot_names
is set correctly, it might be possible to lose data that would have been committed
on the old primary server (in case the standby was not reachable during that time 
for example).

2 ===

+test_sub=# SELECT
+               array_agg(slotname) AS slots
+           FROM
+           ((
+               SELECT r.srsubid AS subid, CONCAT('pg_', srsubid, '_sync_', srrelid, '_', ctl.system_identifier) AS
slotname
+               FROM pg_control_system() ctl, pg_subscription_rel r, pg_subscription s
+               WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND s.subfailover
+           ) UNION (

I guess this format comes from ReplicationSlotNameForTablesync(). What about
creating a SQL callable function on top of it and make use of it in the query
above? (that would ensure to keep the doc up to date even if the format changes
in ReplicationSlotNameForTablesync()).

3 ===

+test_sub=# SELECT
+               MAX(remote_lsn) AS remote_lsn_on_subscriber
+           FROM
+           ((
+               SELECT (CASE WHEN r.srsubstate = 'f' THEN pg_replication_origin_progress(CONCAT('pg_', r.srsubid, '_',
r.srrelid),false)
 
+                           WHEN r.srsubstate IN ('s', 'r') THEN r.srsublsn END) AS remote_lsn
+               FROM pg_subscription_rel r, pg_subscription s
+               WHERE r.srsubstate IN ('f', 's', 'r') AND s.oid = r.srsubid AND s.subfailover
+           ) UNION (
+               SELECT pg_replication_origin_progress(CONCAT('pg_', s.oid), false) AS remote_lsn
+               FROM pg_subscription s
+               WHERE s.subfailover
+           ));

What about adding a join to pg_replication_origin to get rid of the "hardcoded"
format "CONCAT('pg_', r.srsubid, '_', r.srrelid)" and "CONCAT('pg_', s.oid)"?

Idea behind 2 ===  and 3 === is to have the queries as generic as possible and
not rely on a hardcoded format (that would be more difficult to maintain should
those formats change in the future).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

From

Peter Smith

Date:

23 May 2024, 05:33:32

Here are some review comments for the docs patch v3-0001.

======
Commit message

1.
This patch adds detailed documentation for the slot sync feature
including examples to guide users on how to verify that all slots have
been successfully synchronized to the standby server and how to
confirm whether the subscription can continue subscribing to
publications on the promoted standby server.

~

This may be easier to read if you put it in bullet form like below:

SUGGESTION

It includes examples to guide the user:

* How to verify that all slots have been successfully synchronized to
the standby server

* How to confirm whether the subscription can continue subscribing to
publications on the promoted standby server

======
doc/src/sgml/high-availability.sgml

2.
+   <para>
+    If you have opted for synchronization of logical slots (see
+    <xref linkend="logicaldecoding-replication-slots-synchronization"/>),
+    then before switching to the standby server, it is recommended to check
+    if the logical slots synchronized on the standby server are ready
+    for failover. This can be done by following the steps described in
+    <xref linkend="logical-replication-failover"/>.
+   </para>
+

Maybe it is better to call this feature "logical replication slot
synchronization" to be more consistent with the title of section
47.2.3.

SUGGESTION
If you have opted for logical replication slot synchronization (see ...

======
doc/src/sgml/logical-replication.sgml

3.
+  <para>
+   When the publisher server is the primary server of a streaming replication,
+   the logical slots on that primary server can be synchronized to the standby
+   server by specifying <literal>failover = true</literal> when creating
+   subscriptions for those publications. Enabling failover ensures a seamless
+   transition of those subscriptions after the standby is promoted. They can
+   continue subscribing to publications now on the new primary server without
+   losing any data that has been flushed to the new primary server.
+  </para>
+

3a.
BEFORE
When the publisher server is the primary server of...

SUGGESTION
When publications are defined on the primary server of...

~

3b.
Enabling failover...

Maybe say "Enabling the failover parameter..." and IMO there should
also be a link to the CREATE SUBSCRIPTION failover parameter so the
user can easily navigate there to read more about it.

~

3c.
BEFORE
They can continue subscribing to publications now on the new primary
server without losing any data that has been flushed to the new
primary server.

SUGGESTION (removes some extra info I did not think was needed)
They can continue subscribing to publications now on the new primary
server without any loss of data.

~~~

4.
+  <para>
+   Because the slot synchronization logic copies asynchronously, it is
+   necessary to confirm that replication slots have been synced to the standby
+   server before the failover happens. Furthermore, to ensure a successful
+   failover, the standby server must not be lagging behind the subscriber. It
+   is highly recommended to use
+   <link linkend="guc-standby-slot-names"><varname>standby_slot_names</varname></link>
+   to prevent the subscriber from consuming changes faster than the
hot standby.
+   To confirm that the standby server is indeed ready for failover, follow
+   these 2 steps:
+  </para>

IMO the last sentence "To confirm..." should be a new paragraph.

~~~

5.
+      <para>
+       Firstly, on the subscriber node, use the following SQL to identify
+       which slots should be synced to the standby that we plan to promote.

AND

+      <para>
+       Next, check that the logical replication slots identified above exist on
+       the standby server and are ready for failover.

~~

5a.
I don't think you need to say "Firstly," and "Next," because the order
to do these steps is already self-evident.

~

5b.
Patch says "on the subscriber node", but isn't that the simplest case?
e.g. maybe there are multiple nodes having subscriptions for these
publications. Maybe the sentence needs to account for case of
subscribers on >1 nodes.

Is there no way to discover this information by querying the publisher?

~~~

6.
+<programlisting>
+test_sub=# SELECT
+               array_agg(slotname) AS slots
+           FROM

...

+<programlisting>
+test_standby=# SELECT slot_name, (synced AND NOT temporary AND NOT
conflicting) AS failover_ready
+               FROM pg_replication_slots
+               WHERE slot_name IN ('sub1','sub2','sub3');


The example SQL for "1a" refers to 'slotname', but the example SQL for
"1b" refers to "slot_name" (i.e. with underscore). It might be better
if those are consistently called 'slot_name'.

~~~

7.
+   <step performance="required">
+    <para>
+     Confirm that the standby server is not lagging behind the subscribers.
+     This step can be skipped if
+     <link linkend="guc-standby-slot-names"><varname>standby_slot_names</varname></link>
+     has been correctly configured. If standby_slot_names is not configured
+     correctly, it is highly recommended to run this step after the primary
+     server is down, otherwise the results of the query may vary at different
+     points of time due to the ongoing replication on the logical subscribers
+     from the primary server.
+    </para>

7a.
I felt that the step should just say "Confirm that the standby server
is not lagging behind the subscribers.". So the text "This step can be
skipped..." should be a separate paragraph.

~

7b.
The 2nd standby_slot_names should use a varname font.

~

7c.
/may vary at different points in time due to/can vary due to/

~~~~

8.
+       <para>
+        Firstly, on the subscriber node check the last replayed WAL.
+        This step needs to be run on the database(s) that includes the failover
+        enabled subscription(s), to find the last replayed WAL on
each database.

8a.
Don't need to say "Firstly,"

~

8b.
The text "This step..." can be simplified as below:

BEFORE
This step needs to be run on the database(s) that includes the
failover enabled subscription(s), to find the last replayed WAL on
each database.

SUGGESTION
This step needs to be run on any database that includes
failover-enabled subscriptions.

~~~

9.
+     <para>
+      Next, on the standby server check that the last-received WAL location
+      is ahead of the replayed WAL location(s) on the subscriber identified
+      above. If the above SQL result was NULL, it means the subscriber has not
+      yet replayed any WAL, so the standby server must be ahead of the
+      subscriber, and this step can be skipped.


Don't need to say "Next,"

~~~

10.
+  <para>
+   If the result (<literal>failover_ready</literal>) of both above steps is
+   true, existing subscriptions will be able to continue without
losing any data
+   that has been flushed to the new primary server.
+  </para>

Let's word this more like the same sentence top of the page. See
review comment #3c

SUGGESTION
If the result (<literal>failover_ready</literal>) of both steps above
is true, then existing subscriptions can continue subscribing to
publications now on the new primary server without any loss of data.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

05 June 2024, 02:19:41

On Wednesday, May 8, 2024 5:21 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> A few comments:
Thanks for the comments!

> 2 ===
> 
> +test_sub=# SELECT
> +               array_agg(slotname) AS slots
> +           FROM
> +           ((
> +               SELECT r.srsubid AS subid, CONCAT('pg_', srsubid, '_sync_',
> srrelid, '_', ctl.system_identifier) AS slotname
> +               FROM pg_control_system() ctl, pg_subscription_rel r,
> pg_subscription s
> +               WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND
> s.subfailover
> +           ) UNION (
> 
> I guess this format comes from ReplicationSlotNameForTablesync(). What
> about creating a SQL callable function on top of it and make use of it in the
> query above? (that would ensure to keep the doc up to date even if the format
> changes in ReplicationSlotNameForTablesync()).

We could add a new function as suggested but I think it's not the right
time(beta1) to add this function because new function will bring
catversion bump which I think may not be worth at this stage. I think we can
consider this after releasing and maybe gather more use cases for the new
function you suggested.

> 
> 3 ===
> 
> +test_sub=# SELECT
> +               MAX(remote_lsn) AS remote_lsn_on_subscriber
> +           FROM
> +           ((
> +               SELECT (CASE WHEN r.srsubstate = 'f' THEN
> pg_replication_origin_progress(CONCAT('pg_', r.srsubid, '_', r.srrelid), false)
> +                           WHEN r.srsubstate IN ('s', 'r') THEN r.srsublsn
> END) AS remote_lsn
> +               FROM pg_subscription_rel r, pg_subscription s
> +               WHERE r.srsubstate IN ('f', 's', 'r') AND s.oid = r.srsubid AND
> s.subfailover
> +           ) UNION (
> +               SELECT pg_replication_origin_progress(CONCAT('pg_',
> s.oid), false) AS remote_lsn
> +               FROM pg_subscription s
> +               WHERE s.subfailover
> +           ));
> 
> What about adding a join to pg_replication_origin to get rid of the "hardcoded"
> format "CONCAT('pg_', r.srsubid, '_', r.srrelid)" and "CONCAT('pg_', s.oid)"?

I tried a bit, but it doesn't seem feasible to get the relationship between
subscription and origin by querying pg_subscription and
pg_replication_origin.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

05 June 2024, 02:22:44

On Thursday, May 23, 2024 1:34 PM Peter Smith <smithpb2250@gmail.com> wrote:

Thanks for the comments. I addressed most of the comments except the
following one which I am not sure:

> 5b.
> Patch says "on the subscriber node", but isn't that the simplest case?
> e.g. maybe there are multiple nodes having subscriptions for these
> publications. Maybe the sentence needs to account for case of subscribers on
> >1 nodes.

I think it's not necessary mention the multiple nodes case, as in that case, user can just
perform the same steps on each node that have failover subscription.

> Is there no way to discover this information by querying the publisher?

I am not aware of the way for user to get the necessary info such as replication origin
progress on the publisher, because such information is only available on subscriber.

Attach the V4 doc patch which addressed Peter and Bertrand's comments.

Best Regards,
Hou zj

Attachment

v4-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

05 June 2024, 06:05:28

On Wed, Jun 5, 2024 at 7:52 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Attach the V4 doc patch which addressed Peter and Bertrand's comments.
>

Few comments:

1.
+       On the subscriber node, use the following SQL to identify
+       which slots should be synced to the standby that we plan to promote.
+<programlisting>
+test_sub=# SELECT
+               array_agg(slot_name) AS slots
+           FROM
+           ((
+               SELECT r.srsubid AS subid, CONCAT('pg_', srsubid,
'_sync_', srrelid, '_', ctl.system_identifier) AS slot_name
+               FROM pg_control_system() ctl, pg_subscription_rel r,
pg_subscription s
+               WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND s.subfailover
+           ) UNION (
+               SELECT s.oid AS subid, s.subslotname as slot_name
+               FROM pg_subscription s
+               WHERE s.subfailover
+           ));
+ slots
+-------
+ {sub1,sub2,sub3}

This should additionally say what exactly this SQL is doing to fetch
the required slots.

2.
If <varname>standby_slot_names</varname> is
+     not configured correctly, it is highly recommended to run this step after
+     the primary server is down, otherwise the results of the query can vary
+     due to the ongoing replication on the logical subscribers from the primary
+     server.
+    </para>
+     <substeps>
+      <step performance="required">
+       <para>
+        On the subscriber node, check the last replayed WAL.
+        This step needs to be run on any database that includes
failover enabled
+        subscriptions.
+<programlisting>
+test_sub=# SELECT
+               MAX(remote_lsn) AS remote_lsn_on_subscriber
+           FROM

If the 'standby_slot_names' is not configured then we can't ensure
that standby is always ahead because what if immediately after running
this query the additional WAL got synced to the subscriber before
standby? Now, as you mentioned users can first shutdown primary to
ensure that no additional WAL is sent to the subscriber. After that,
it is possible that one can use these complex queries to ensure that
the subscriber is behind the standby but it is better to encourage
users to use standby_slot_names to ensure the same. If at all we get
such use cases and or requirements then we can add such additional
steps after understanding the user's requirements. For now, we should
remove these additional steps.

--
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

From

Peter Smith

Date:

05 June 2024, 06:32:16

Hi. Here are some minor review comments for the docs patch v4-0001.

======
doc/src/sgml/logical-replication.sgml

1. General

The SGML file wrapping can be fixed to fill up to 80 cols for some of
the paragraphs.

~~~

2.
+   standby is promoted. They can continue subscribing to publications
now on the
+   new primary server without any loss of data. But please note that in case of
+   asynchronous replication, there remains a risk of data loss for transactions
+   that have been committed on the former primary server but have yet to be
+   replicated to the new primary server.
+  </para>

/in case/in the case/

/But please note that.../Note that.../

======
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

06 June 2024, 03:55:01

On Wednesday, June 5, 2024 2:32 PM Peter Smith <smithpb2250@gmail.com> wrote:

> Hi. Here are some minor review comments for the docs patch v4-0001.

Thanks for the comments!

> The SGML file wrapping can be fixed to fill up to 80 cols for some of the
> paragraphs.

Unlike comments in C code, I think we don't force the 80 cols limit in doc
file unless it's too long to read. I checked the doc once and think it's
OK.

Here is the V5 patch which addressed Peter's comments and Amit's comments[1].

[1] https://www.postgresql.org/message-id/CAA4eK1%2Bq1MYGgF3-LZCj6Xd0idujnjbTsfk-RqU%2BC51wYGaD5g%40mail.gmail.com

Best Regards,
Hou zj

Attachment

v5-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch

Re: Synchronizing slots from primary to standby

From

Peter Smith

Date:

06 June 2024, 04:21:15

Hi, here are some review comments for the docs patch v5-0001.

Apart from these it LGTM.

======
doc/src/sgml/logical-replication.sgml

1.
+    <para>
+     On the subscriber node, use the following SQL to identify which slots
+     should be synced to the standby that we plan to promote. This query will
+     return the relevant replication slots, including the main slots and table
+     synchronization slots associated with the failover enabled subscriptions.

/failover enabled/failover-enabled/

~~~

2.
+  <para>
+   If all the slots are present on the standby server and result
+   (<literal>failover_ready</literal>) of is true, then existing subscriptions
+   can continue subscribing to publications now on the new primary server
+   without any loss of data.
+  </para>

Hmm. It looks like there is some typo or missing words here: "of is true".

Did you mean something like: "of the above SQL query is true"?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: Synchronizing slots from primary to standby

From

"Zhijie Hou (Fujitsu)"

Date:

07 June 2024, 02:27:28

On Thursday, June 6, 2024 12:21 PM Peter Smith <smithpb2250@gmail.com>
>
> Hi, here are some review comments for the docs patch v5-0001.

Thanks for the comments! Here is the V6 patch that addressed the these.

Best Regards,
Hou zj

Attachment

v6-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch

Re: Synchronizing slots from primary to standby

From

Amit Kapila

Date:

07 June 2024, 09:06:56

On Fri, Jun 7, 2024 at 7:57 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Thanks for the comments! Here is the V6 patch that addressed the these.
>

I have pushed this after making minor changes in the wording. I have
also changed one of the queries in docs to ignore the NULL slot_name
values.

--
With Regards,
Amit Kapila.