Thread: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

From
Jeff Davis
Date:
Commit 19890a064 changed pg_create_logical_replication_slot() to allow
decoding of two-phase transactions, but did not extend the
CREATE_REPLICATION_SLOT command to support it. Strangely, it does
extend the CreateReplicationSlotCmd struct to add a "two_phase" field,
but doesn't set it anywhere.

There were patches[1] from around the time of the commit to support
CREATE_REPLICATION_SLOT as well.

Is there a reason to support two-phase decoding, but not with the
replication protocol? If so, why change the CreateReplicationSlotCmd
structure as though we will support it?

Regards,
    Jeff Davis

[1] 
https://www.postgresql.org/message-id/CAFPTHDZ2rigOf0oM0OBhv1yRmyMO5-SQfT9FCLYj-Jp9ShXB3A@mail.gmail.com





On Thu, Jun 3, 2021 at 4:48 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> Commit 19890a064 changed pg_create_logical_replication_slot() to allow
> decoding of two-phase transactions, but did not extend the
> CREATE_REPLICATION_SLOT command to support it. Strangely, it does
> extend the CreateReplicationSlotCmd struct to add a "two_phase" field,
> but doesn't set it anywhere.
>
> There were patches[1] from around the time of the commit to support
> CREATE_REPLICATION_SLOT as well.
>
> Is there a reason to support two-phase decoding, but not with the
> replication protocol? If so, why change the CreateReplicationSlotCmd
> structure as though we will support it?
>

The idea is to support two_phase via protocol with a subscriber-side
work where we can test it as well. The code to support it via
replication protocol is present in the first patch of subscriber-side
work [1] which uses that code as well. Basically, we don't have a good
way to test it without subscriber-side work so decided to postpone it
till the corresponding work is done. I think we can remove the change
in CreateReplicationSlotCmd, that is a leftover. If we have to support
it via protocol, then at the minimum, we need to enhance
pg_recvlogical so that the same can be tested.

[1] - https://www.postgresql.org/message-id/CAHut%2BPt7wnctZpfhaLyuPA0mXDAtuw7DsMUfw3TePJLxqTArjA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.



On Thu, 2021-06-03 at 09:29 +0530, Amit Kapila wrote:
> The idea is to support two_phase via protocol with a subscriber-side
> work where we can test it as well. The code to support it via
> replication protocol is present in the first patch of subscriber-side
> work [1] which uses that code as well. Basically, we don't have a
> good
> way to test it without subscriber-side work so decided to postpone it
> till the corresponding work is done.

Thank you for clarifying.

Right now, it feels a bit incomplete. If it's not much work, I
recommend breaking out the CREATE_REPLICATION_SLOT changes and updating
pg_recvlogical, so that it can go in v14 (and
pg_create_logical_replication_slot() will match
CREATE_REPLICATION_SLOT). But if that's complicated or controversial,
then I'm fine waiting for the other work to complete.

Regards,
    Jeff Davis





On Thu, Jun 3, 2021 at 10:08 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Thu, 2021-06-03 at 09:29 +0530, Amit Kapila wrote:
> > The idea is to support two_phase via protocol with a subscriber-side
> > work where we can test it as well. The code to support it via
> > replication protocol is present in the first patch of subscriber-side
> > work [1] which uses that code as well. Basically, we don't have a
> > good
> > way to test it without subscriber-side work so decided to postpone it
> > till the corresponding work is done.
>
> Thank you for clarifying.
>
> Right now, it feels a bit incomplete. If it's not much work, I
> recommend breaking out the CREATE_REPLICATION_SLOT changes and updating
> pg_recvlogical, so that it can go in v14 (and
> pg_create_logical_replication_slot() will match
> CREATE_REPLICATION_SLOT). But if that's complicated or controversial,
> then I'm fine waiting for the other work to complete.
>

I think we can try but not sure if we can get it by then. So, here is
my suggestion:
a. remove the change in CreateReplicationSlotCmd
b. prepare the patches for protocol change and pg_recvlogical. This
will anyway include the change we removed as part of (a).

Does that sound reasonable?

-- 
With Regards,
Amit Kapila.



On Fri, Jun 4, 2021 at 1:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

> I think we can try but not sure if we can get it by then. So, here is
> my suggestion:
> a. remove the change in CreateReplicationSlotCmd
> b. prepare the patches for protocol change and pg_recvlogical. This
> will anyway include the change we removed as part of (a).


Attaching two patches:
1. Removes two-phase from CreateReplicationSlotCmd
2. Adds two-phase option in CREATE_REPLICATION_SLOT command.

I will send a patch to update pg_recvlogical next week.

regards,
Ajin Cherian
Fujitsu Australia

Attachment
On Fri, 2021-06-04 at 08:36 +0530, Amit Kapila wrote:
> I think we can try but not sure if we can get it by then. So, here is
> my suggestion:
> a. remove the change in CreateReplicationSlotCmd
> b. prepare the patches for protocol change and pg_recvlogical. This
> will anyway include the change we removed as part of (a).

Yes, sounds good.

Regards,
    Jeff Davis




On Fri, Jun 4, 2021 at 2:29 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> On Fri, Jun 4, 2021 at 1:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > I think we can try but not sure if we can get it by then. So, here is
> > my suggestion:
> > a. remove the change in CreateReplicationSlotCmd
> > b. prepare the patches for protocol change and pg_recvlogical. This
> > will anyway include the change we removed as part of (a).
>
>
> Attaching two patches:
> 1. Removes two-phase from CreateReplicationSlotCmd
>

Pushed the above patch.

-- 
With Regards,
Amit Kapila.



On Mon, Jun 7, 2021 at 3:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

> Pushed the above patch.

Here's an updated patchset that adds back in the option for two-phase
in CREATE_REPLICATION_SLOT command and a second patch that adds
support for
two-phase decoding in pg_recvlogical.

regards,
Ajin Cherian

Attachment
On Tue, 2021-06-08 at 17:41 +1000, Ajin Cherian wrote:
> Here's an updated patchset that adds back in the option for two-phase
> in CREATE_REPLICATION_SLOT command and a second patch that adds
> support for
> two-phase decoding in pg_recvlogical.

A few things:

* I suggest putting the TWO_PHASE keyword after the LOGICAL keyword
* Document the TWO_PHASE keyword in doc/src/sgml/protocol.sgml
* Cross check that --two-phase is specified only if --create-slot is
specified
* Maybe an Assert(!(two_phase && is_physical)) in
CreateReplicationSlot()?

Other than that, it looks good, and it works as I expect it to.

Regards,
    Jeff Davis





On Wed, Jun 9, 2021 at 6:23 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Tue, 2021-06-08 at 17:41 +1000, Ajin Cherian wrote:
> > Here's an updated patchset that adds back in the option for two-phase
> > in CREATE_REPLICATION_SLOT command and a second patch that adds
> > support for
> > two-phase decoding in pg_recvlogical.
>
> A few things:
>
> * I suggest putting the TWO_PHASE keyword after the LOGICAL keyword
> * Document the TWO_PHASE keyword in doc/src/sgml/protocol.sgml
> * Cross check that --two-phase is specified only if --create-slot is
> specified
> * Maybe an Assert(!(two_phase && is_physical)) in
> CreateReplicationSlot()?
>
> Other than that, it looks good, and it works as I expect it to.


Updated. Do have a look.

thanks,
Ajin Cherian
Fujitsu Australia

Attachment
On Wed, Jun 9, 2021 at 1:53 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Tue, 2021-06-08 at 17:41 +1000, Ajin Cherian wrote:
> > Here's an updated patchset that adds back in the option for two-phase
> > in CREATE_REPLICATION_SLOT command and a second patch that adds
> > support for
> > two-phase decoding in pg_recvlogical.
>
> A few things:
>
> * I suggest putting the TWO_PHASE keyword after the LOGICAL keyword
>

Isn't it better to add it after LOGICAL IDENT? In docs [1], we expect
that way. Also, see code in libpqrcv_create_slot where we expect them
to be together but we can change that if you still prefer to add it
after LOGICAL. BTW, can't we consider it to be part of
create_slot_opt_list?

Also, it might be good if we can add a test in
src/bin/pg_basebackup/t/030_pg_recvlogical

[1] - https://www.postgresql.org/docs/devel/logicaldecoding-walsender.html

-- 
With Regards,
Amit Kapila.



On Wed, Jun 9, 2021 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jun 9, 2021 at 1:53 AM Jeff Davis <pgsql@j-davis.com> wrote:
> >
> > On Tue, 2021-06-08 at 17:41 +1000, Ajin Cherian wrote:
> > > Here's an updated patchset that adds back in the option for two-phase
> > > in CREATE_REPLICATION_SLOT command and a second patch that adds
> > > support for
> > > two-phase decoding in pg_recvlogical.
> >
> > A few things:
> >
> > * I suggest putting the TWO_PHASE keyword after the LOGICAL keyword
> >
>
> Isn't it better to add it after LOGICAL IDENT? In docs [1], we expect
> that way. Also, see code in libpqrcv_create_slot where we expect them
> to be together but we can change that if you still prefer to add it
> after LOGICAL. BTW, can't we consider it to be part of
> create_slot_opt_list?
>
> Also, it might be good if we can add a test in
> src/bin/pg_basebackup/t/030_pg_recvlogical
>

Some more points:
1. pg_recvlogical can only send two_phase option if
(PQserverVersion(conn) >= 140000), otherwise, it won't work for older
versions of the server.
2. In the main patch [1], we do send two_phase option even during
START_REPLICATION for the very first time when the two_phase can be
enabled. There are reasons as described in the worker.c why we can't
enable it during CREATE_REPLICATION_SLOT. Now, if we want to do
protocol changes, I wonder why only do some changes and leave the rest
for the next version?

[1] - https://www.postgresql.org/message-id/CAHut%2BPt7wnctZpfhaLyuPA0mXDAtuw7DsMUfw3TePJLxqTArjA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.



On Wed, 2021-06-09 at 16:50 +0530, Amit Kapila wrote:
> BTW, can't we consider it to be part of
> create_slot_opt_list?

Yes, that would be better. It looks like the physical and logical slot
options are mixed together -- should they be separated in the grammar
so that using an option with the wrong kind of slot would be a parse
error?

Regards,
    Jeff Davis





On Wed, 2021-06-09 at 17:27 +0530, Amit Kapila wrote:
> 2. In the main patch [1], we do send two_phase option even during
> START_REPLICATION for the very first time when the two_phase can be
> enabled. There are reasons as described in the worker.c why we can't
> enable it during CREATE_REPLICATION_SLOT. 

I'll have to catch up on the thread to digest that reasoning and how it
applies to decoding vs. replication. But there don't seem to be changes
to START_REPLICATION for twophase, so I don't quite follow your point.

Are you saying that we should not be able to create slots with twophase
at all until the rest of the changes go in?

> Now, if we want to do
> protocol changes, I wonder why only do some changes and leave the
> rest
> for the next version?

I started this thread because it's possible to create a slot a certain
way using the SQL function create_logical_replication_slot(), but it's
impossible over the replication protocol. That seems inconsistent to
me.

Regards,
    Jeff Davis





Jeff Davis <pgsql@j-davis.com> writes:
> On Wed, 2021-06-09 at 16:50 +0530, Amit Kapila wrote:
>> BTW, can't we consider it to be part of
>> create_slot_opt_list?

> Yes, that would be better. It looks like the physical and logical slot
> options are mixed together -- should they be separated in the grammar
> so that using an option with the wrong kind of slot would be a parse
> error?

That sort of parse error is usually pretty unfriendly to users who
may not quite remember which options are for what; all they'll get
is "syntax error" which won't illuminate anything.  I'd rather let
the grammar accept both, and throw an appropriate error further
downstream.

            regards, tom lane



On Thu, Jun 10, 2021 at 4:13 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Wed, 2021-06-09 at 17:27 +0530, Amit Kapila wrote:
> > 2. In the main patch [1], we do send two_phase option even during
> > START_REPLICATION for the very first time when the two_phase can be
> > enabled. There are reasons as described in the worker.c why we can't
> > enable it during CREATE_REPLICATION_SLOT.
>
> I'll have to catch up on the thread to digest that reasoning and how it
> applies to decoding vs. replication. But there don't seem to be changes
> to START_REPLICATION for twophase, so I don't quite follow your point.
>

I think it is because we pass there as an option as I have suggested
doing in the case of CREATE_REPLICATION_SLOT.

> Are you saying that we should not be able to create slots with twophase
> at all until the rest of the changes go in?
>

No, the slots will be created but two_phase option will be enabled
only after the initial tablesync is complete.

> > Now, if we want to do
> > protocol changes, I wonder why only do some changes and leave the
> > rest
> > for the next version?
>
> I started this thread because it's possible to create a slot a certain
> way using the SQL function create_logical_replication_slot(), but it's
> impossible over the replication protocol. That seems inconsistent to
> me.
>

Right, I understand that but on the protocol side, there are few more
things to be considered to allow subscribers to enable two_phase.
However, maybe, for now, we can do it just for create_replication_slot
and the start_replication stuff required for subscribers can be done
later. I was not completely sure if that is a good idea.

-- 
With Regards,
Amit Kapila.



On Wed, Jun 9, 2021 at 9:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jun 9, 2021 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jun 9, 2021 at 1:53 AM Jeff Davis <pgsql@j-davis.com> wrote:
> > >
> > > On Tue, 2021-06-08 at 17:41 +1000, Ajin Cherian wrote:
> > > > Here's an updated patchset that adds back in the option for two-phase
> > > > in CREATE_REPLICATION_SLOT command and a second patch that adds
> > > > support for
> > > > two-phase decoding in pg_recvlogical.
> > >
> > > A few things:
> > >
> > > * I suggest putting the TWO_PHASE keyword after the LOGICAL keyword
> > >
> >
> > Isn't it better to add it after LOGICAL IDENT? In docs [1], we expect
> > that way. Also, see code in libpqrcv_create_slot where we expect them
> > to be together but we can change that if you still prefer to add it
> > after LOGICAL. BTW, can't we consider it to be part of
> > create_slot_opt_list?

Changed accordingly.

> Some more points:
> 1. pg_recvlogical can only send two_phase option if
> (PQserverVersion(conn) >= 140000), otherwise, it won't work for older
> versions of the server.

Updated accordingly.

I've also modified the pg_recvlogical test case with the new option.

regards,
Ajin Cherian

Attachment
On Thu, Jun 10, 2021 at 2:04 PM Ajin Cherian <itsajin@gmail.com> wrote:
>

The new patches look mostly good apart from the below cosmetic issues.
I think the question is whether we want to do these for PG-14 or
postpone them till PG-15. I think these don't appear to be risky
changes so we can get them in PG-14 as that might help some outside
core solutions as appears to be the case for Jeff. The changes related
to start_replication are too specific to the subscriber-side solution
so we can postpone those along with the subscriber-side 2PC work.
Jeff, Ajin, what do you think?

Also, I can take care of the below cosmetic issues before committing
if we decide to do this for PG-14.

Few cosmetic issues:
==================
1. git diff --check shows
src/bin/pg_basebackup/t/030_pg_recvlogical.pl:109: new blank line at EOF.

2.
+
   <para>
   The following example shows SQL interface that can be used to decode prepared
   transactions. Before you use two-phase commit commands, you must set

Spurious line addition.

3.
/* Build query */
  appendPQExpBuffer(query, "CREATE_REPLICATION_SLOT \"%s\"", slot_name);
  if (is_temporary)
  appendPQExpBufferStr(query, " TEMPORARY");
+
  if (is_physical)

Spurious line addition.

4.
  appendPQExpBuffer(query, " LOGICAL \"%s\"", plugin);
+ if (two_phase && PQserverVersion(conn) >= 140000)
+ appendPQExpBufferStr(query, " TWO_PHASE");
+
  if (PQserverVersion(conn) >= 100000)
  /* pg_recvlogical doesn't use an exported snapshot, so suppress */
  appendPQExpBufferStr(query, " NOEXPORT_SNAPSHOT");

I think it might be better to append TWO_PHASE after NOEXPORT_SNAPSHOT
but it doesn't matter much.

5.
+$node->safe_psql('postgres',
+ "BEGIN;INSERT INTO test_table values (11); PREPARE TRANSACTION 'test'");

There is no space after BEGIN but there is a space after INSERT. For
consistency-sake, I will have space after BEGIN as well.



-- 
With Regards,
Amit Kapila.



On Fri, Jun 11, 2021 at 8:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jun 10, 2021 at 2:04 PM Ajin Cherian <itsajin@gmail.com> wrote:
> >
>
> The new patches look mostly good apart from the below cosmetic issues.
> I think the question is whether we want to do these for PG-14 or
> postpone them till PG-15. I think these don't appear to be risky
> changes so we can get them in PG-14 as that might help some outside
> core solutions as appears to be the case for Jeff. The changes related
> to start_replication are too specific to the subscriber-side solution
> so we can postpone those along with the subscriber-side 2PC work.
> Jeff, Ajin, what do you think?
>

Since we are exposing two-phase decoding using the
pg_create_replication_slot API, I think
it is reasonable to expose CREATE_REPLICATION_SLOT as well. We can
leave the subscriber side changes
for PG-15.

> Also, I can take care of the below cosmetic issues before committing
> if we decide to do this for PG-14.

Thanks,

Regards,
Ajin Cherian
Fujitsu Australia



On Fri, 2021-06-11 at 15:43 +0530, Amit Kapila wrote:
> The new patches look mostly good apart from the below cosmetic
> issues.
> I think the question is whether we want to do these for PG-14 or
> postpone them till PG-15. I think these don't appear to be risky
> changes so we can get them in PG-14 as that might help some outside
> core solutions as appears to be the case for Jeff. 

My main interest here is that I'm working on replication protocol
support in a rust library. While doing that, I've encountered a lot of
rough edges (as you may have seen in my recent posts), and this patch
fixes one of them.

But at the same time, several small changes to the protocol spread
across several releases introduces more opportunity for confusion.

If we are confident this is the right direction, then v14 makes sense
for consistency. But if waiting for v15 might result in a better
change, then we should probably just get it into the July CF for v15.

(My apologies if my opinion has drifted a bit since this thread began.)

Regards,
    Jeff Davis





On Sat, Jun 12, 2021 at 12:56 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Fri, 2021-06-11 at 15:43 +0530, Amit Kapila wrote:
> > The new patches look mostly good apart from the below cosmetic
> > issues.
> > I think the question is whether we want to do these for PG-14 or
> > postpone them till PG-15. I think these don't appear to be risky
> > changes so we can get them in PG-14 as that might help some outside
> > core solutions as appears to be the case for Jeff.
>
> My main interest here is that I'm working on replication protocol
> support in a rust library. While doing that, I've encountered a lot of
> rough edges (as you may have seen in my recent posts), and this patch
> fixes one of them.
>
> But at the same time, several small changes to the protocol spread
> across several releases introduces more opportunity for confusion.
>
> If we are confident this is the right direction, then v14 makes sense
> for consistency. But if waiting for v15 might result in a better
> change, then we should probably just get it into the July CF for v15.
>

If that is the case, I would prefer July CF v15. The patch is almost
ready, so I'll try to get it early in the July CF. Ajin, feel free to
post the patch after addressing some minor comments raised by me
yesterday.

-- 
With Regards,
Amit Kapila.



On Fri, Jun 11, 2021 at 8:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>

> Also, I can take care of the below cosmetic issues before committing
> if we decide to do this for PG-14.
>
> Few cosmetic issues:
> ==================
> 1. git diff --check shows
> src/bin/pg_basebackup/t/030_pg_recvlogical.pl:109: new blank line at EOF.
>
> 2.
> +
>    <para>
>    The following example shows SQL interface that can be used to decode prepared
>    transactions. Before you use two-phase commit commands, you must set
>
> Spurious line addition.
>

Fixed.

> 3.
> /* Build query */
>   appendPQExpBuffer(query, "CREATE_REPLICATION_SLOT \"%s\"", slot_name);
>   if (is_temporary)
>   appendPQExpBufferStr(query, " TEMPORARY");
> +
>   if (is_physical)
>
> Spurious line addition.
>

Fixed.

> 4.
>   appendPQExpBuffer(query, " LOGICAL \"%s\"", plugin);
> + if (two_phase && PQserverVersion(conn) >= 140000)
> + appendPQExpBufferStr(query, " TWO_PHASE");
> +
>   if (PQserverVersion(conn) >= 100000)
>   /* pg_recvlogical doesn't use an exported snapshot, so suppress */
>   appendPQExpBufferStr(query, " NOEXPORT_SNAPSHOT");
>
> I think it might be better to append TWO_PHASE after NOEXPORT_SNAPSHOT
> but it doesn't matter much.
>

I haven't changed this, I like to keep it this way.

> 5.
> +$node->safe_psql('postgres',
> + "BEGIN;INSERT INTO test_table values (11); PREPARE TRANSACTION 'test'");
>
> There is no space after BEGIN but there is a space after INSERT. For
> consistency-sake, I will have space after BEGIN as well.

Changed this.

regards,
Ajin Cherian
Fujitsu Australia

Attachment
On Tue, Jun 15, 2021 at 5:34 PM Ajin Cherian <itsajin@gmail.com> wrote:

Since we've decided to not commit this for PG-14, I've added these
patches along with the larger patch-set for
subscriber side 2pc in thread [1]

[1] -  https://www.postgresql.org/message-id/CAHut+PuJKTNRjFre0VBufWMz9BEScC__nT+PUhbSaUNW2biPow@mail.gmail.com

regards,
Ajin Cherian