Thread: Re: [PATCH] Provide more information to filter_prepare

Re: [PATCH] Provide more information to filter_prepare

From
Amit Kapila
Date:
On Sat, Mar 13, 2021 at 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Mar 11, 2021 at 2:44 PM Markus Wanner
> <markus.wanner@enterprisedb.com> wrote:
> >
> > On 11.03.21 04:58, Amit Kapila wrote:
> > > But this happens when we are decoding prepare, so it is clear that the
> > > transaction is prepared, why any additional check?
> >
> > An output plugin cannot assume the transaction is still prepared and
> > uncommitted at the point in time it gets to decode the prepare.
> > Therefore, the transaction may or may not be still in progress.
> > However, my point is that the xid is the more generally useful
> > identifier than the gid.
> >
> > > What in this can't be done with GID and how XID can achieve it?
> >
> > It's a convenience.  Of course, an output plugin could lookup the xid
> > via the gid.  But why force it to have to do that when the xid would be
> > so readily available?
> >
>
> I am not suggesting doing any such look-up. It is just that the use of
> additional parameter(s) for deciding whether to decode at prepare time
> or to decode later as a regular one-phase transaction is not clear to
> me. Now, it is possible that your argument is right that passing
> additional information gives flexibility to plugin authors and we
> should just do what you are saying or maybe go even a step further and
> pass ReorderBufferTxn but I am not completely sure about this point
> because I didn't hear of any concrete use case.
>

During a discussion of GID's in the nearby thread [1], it came up that
the replication solutions might want to generate a different GID based
on xid for two-phase transactions, so it seems this patch has a
use-case.

Markus, feel free to update the docs, you might want to mention about
use-case of XID. Also, feel free to add an open item on PG-14 Open
Items page [2].

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BopiV4aFTmWWUF9h_32%3DHfPOW9vZASHarT0UA5oBrtGw%40mail.gmail.com
[2] - https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items

-- 
With Regards,
Amit Kapila.



Re: [PATCH] Provide more information to filter_prepare

From
Markus Wanner
Date:
Hello Amit,

On 21.03.21 11:53, Amit Kapila wrote:
> During a discussion of GID's in the nearby thread [1], it came up that
> the replication solutions might want to generate a different GID based
> on xid for two-phase transactions, so it seems this patch has a
> use-case.

thank you for reconsidering this patch.  I updated it to include the 
required adjustments to the documentation.  Please review.

> Markus, feel free to update the docs, you might want to mention about
> use-case of XID. Also, feel free to add an open item on PG-14 Open
> Items page [2].

Yes, will add.

Regards

Markus

Attachment

Re: [PATCH] Provide more information to filter_prepare

From
Markus Wanner
Date:
On 22.03.21 09:50, Markus Wanner wrote:
> thank you for reconsidering this patch.  I updated it to include the 
> required adjustments to the documentation.  Please review.

I tweaked the wording in the docs a bit, resulting in a v3 of this patch.

Regards

Markus

Attachment

Re: [PATCH] Provide more information to filter_prepare

From
Amit Kapila
Date:
On Thu, Mar 25, 2021 at 2:07 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
>
> On 22.03.21 09:50, Markus Wanner wrote:
> > thank you for reconsidering this patch.  I updated it to include the
> > required adjustments to the documentation.  Please review.
>
> I tweaked the wording in the docs a bit, resulting in a v3 of this patch.
>

One minor comment:
-      The callback has to provide the same static answer for a given
-      <parameter>gid</parameter> every time it is called.
+       The callback may be invoked several times per transaction to decode and
+       must provide the same static answer for a given pair of

Why do you think that this callback can be invoked several times per
transaction? I think it could be called at most two times, once at
prepare time, then at commit or rollback time. So, I think using
'multiple' instead of 'several' times is better.

-- 
With Regards,
Amit Kapila.



Re: [PATCH] Provide more information to filter_prepare

From
Amit Kapila
Date:
On Mon, Mar 29, 2021 at 11:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Mar 25, 2021 at 2:07 PM Markus Wanner
> <markus.wanner@enterprisedb.com> wrote:
> >
> > On 22.03.21 09:50, Markus Wanner wrote:
> > > thank you for reconsidering this patch.  I updated it to include the
> > > required adjustments to the documentation.  Please review.
> >
> > I tweaked the wording in the docs a bit, resulting in a v3 of this patch.
> >
>
> One minor comment:
> -      The callback has to provide the same static answer for a given
> -      <parameter>gid</parameter> every time it is called.
> +       The callback may be invoked several times per transaction to decode and
> +       must provide the same static answer for a given pair of
>
> Why do you think that this callback can be invoked several times per
> transaction? I think it could be called at most two times, once at
> prepare time, then at commit or rollback time. So, I think using
> 'multiple' instead of 'several' times is better.
>

+       to it not being a unique identifier.  Therefore, other systems combine
+       the <parameter>xid</parameter> with a node identifier to form a
+       globally unique transaction identifier.

What exactly is the node identifier here? Is it a publisher or
subscriber node id? We might want to be a bit more explicit here?


-- 
With Regards,
Amit Kapila.



Re: [PATCH] Provide more information to filter_prepare

From
Markus Wanner
Date:
On 29.03.21 08:23, Amit Kapila wrote:
> On Mon, Mar 29, 2021 at 11:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Why do you think that this callback can be invoked several times per
>> transaction? I think it could be called at most two times, once at
>> prepare time, then at commit or rollback time. So, I think using
>> 'multiple' instead of 'several' times is better.

Thank you for reviewing.

That's fine with me, I just wanted to provide an explanation for why the 
callback needs to be stable.  (I would not want to limit us in the docs 
to guarantee it is called only twice.  'multiple' sounds generic enough, 
I changed it to that word.)

> What exactly is the node identifier here? Is it a publisher or
> subscriber node id? We might want to be a bit more explicit here?

Good point.  I clarified this to speak of the origin node (given this is 
not necessarily the direct provider when using chained replication).

An updated patch is attached.

Regards

Markus

Attachment

Re: [PATCH] Provide more information to filter_prepare

From
Markus Wanner
Date:
Sorry, git tricked me.  Here's the patch including actual changes.

Regards

Markus

Attachment

Re: [PATCH] Provide more information to filter_prepare

From
Amit Kapila
Date:
On Mon, Mar 29, 2021 at 12:57 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
>
> On 29.03.21 08:23, Amit Kapila wrote:
> > On Mon, Mar 29, 2021 at 11:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > What exactly is the node identifier here? Is it a publisher or
> > subscriber node id? We might want to be a bit more explicit here?
>
> Good point.  I clarified this to speak of the origin node (given this is
> not necessarily the direct provider when using chained replication).
>

This might or might not be valid for all logical replication solutions
but in the publisher-subscriber model, it would easily lead to
duplicate identifiers and block the replication. For example, when
there are multiple subscriptions (say - 2) for multiple publications
(again say-2), the two subscriptions are on Node-B and two
publications are on Node-A. Say both publications are for different
tables tab-1 and tab-2. Now, a prepared transaction involving
operation on both tables will generate the same GID. This will block
forever if someone has set synchronous_standby_names for both
subscriptions because Prepare won't finish till both the subscribers
prepare the transaction and due to conflict one of the subscriber will
never finish the prepare. I thought it might be better to use
subscriber-id (or unique replication-origin-id for a subscription) and
the origin node's xid as that will minimize the chances of any such
collision. We can reach this situation if the user prepares the
transaction with the same name as we have generated but we can suggest
user not to do this or we can generate an internal prepared
transaction name starting with pg_* and disallow prepared transaction
names from the user starting with pg_ as we do in some other cases.

-- 
With Regards,
Amit Kapila.



Re: [PATCH] Provide more information to filter_prepare

From
Markus Wanner
Date:
On 29.03.21 11:13, Amit Kapila wrote:
> This might or might not be valid for all logical replication solutions
> but in the publisher-subscriber model, it would easily lead to
> duplicate identifiers and block the replication. For example, when
> there are multiple subscriptions (say - 2) for multiple publications
> (again say-2), the two subscriptions are on Node-B and two
> publications are on Node-A. Say both publications are for different
> tables tab-1 and tab-2. Now, a prepared transaction involving
> operation on both tables will generate the same GID.

I think you are misunderstanding.  This is about a globally unique 
identifier for a transaction, which has nothing to do with a GID used to 
prepare a transaction.  This *needs* to be the same for what logical is 
the same transaction.

What GID a downsteam subscriber uses when receiving messages from some 
non-Postgres-provided output plugin clearly is out of scope for this 
documentation.  The point is to highlight how the xid can be useful for 
filter_prepare.  And that serves transaction identification purposes.

Regards

Markus



Re: [PATCH] Provide more information to filter_prepare

From
Amit Kapila
Date:
On Mon, Mar 29, 2021 at 3:11 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
>
> On 29.03.21 11:13, Amit Kapila wrote:
> > This might or might not be valid for all logical replication solutions
> > but in the publisher-subscriber model, it would easily lead to
> > duplicate identifiers and block the replication. For example, when
> > there are multiple subscriptions (say - 2) for multiple publications
> > (again say-2), the two subscriptions are on Node-B and two
> > publications are on Node-A. Say both publications are for different
> > tables tab-1 and tab-2. Now, a prepared transaction involving
> > operation on both tables will generate the same GID.
>
> I think you are misunderstanding.  This is about a globally unique
> identifier for a transaction, which has nothing to do with a GID used to
> prepare a transaction.  This *needs* to be the same for what logical is
> the same transaction.
>

Okay, but just in the previous sentence ("However, reuse of the same
<parameter>gid</parameter> for example by a downstream node using
multiple subscriptions may lead to it not being a unique
identifier."), you have explained how sending a GID identifier can
lead to a non-unique identifier for multiple subscriptions. And then
in the next line, the way you are suggesting to generate GID by use of
XID seems to have the same problem, so that caused confusion for me.

-- 
With Regards,
Amit Kapila.



Re: [PATCH] Provide more information to filter_prepare

From
Markus Wanner
Date:
On 29.03.21 11:53, Amit Kapila wrote:
> Okay, but just in the previous sentence ("However, reuse of the same
> <parameter>gid</parameter> for example by a downstream node using
> multiple subscriptions may lead to it not being a unique
> identifier."), you have explained how sending a GID identifier can
> lead to a non-unique identifier for multiple subscriptions.

Maybe the example of the downstream node is a bad one.  I understand 
that can cause confusion.  Let's leave away that example and focus on 
the output plugin side.  v6 attached.

> And then
> in the next line, the way you are suggesting to generate GID by use of
> XID seems to have the same problem, so that caused confusion for me.

It was not intended as a suggestion for how to generate GIDs at all. 
Hopefully leaving away that bad example will make it less likely to 
appear related to GID generation on the subscriber.

Regards

Markus

Attachment

Re: [PATCH] Provide more information to filter_prepare

From
vignesh C
Date:


On Mon, Mar 29, 2021 at 3:30 PM Markus Wanner <markus.wanner@enterprisedb.com> wrote:
>
> On 29.03.21 11:53, Amit Kapila wrote:
> > Okay, but just in the previous sentence ("However, reuse of the same
> > <parameter>gid</parameter> for example by a downstream node using
> > multiple subscriptions may lead to it not being a unique
> > identifier."), you have explained how sending a GID identifier can
> > lead to a non-unique identifier for multiple subscriptions.
>
> Maybe the example of the downstream node is a bad one.  I understand
> that can cause confusion.  Let's leave away that example and focus on
> the output plugin side.  v6 attached.
>
> > And then
> > in the next line, the way you are suggesting to generate GID by use of
> > XID seems to have the same problem, so that caused confusion for me.
>
> It was not intended as a suggestion for how to generate GIDs at all.
> Hopefully leaving away that bad example will make it less likely to
> appear related to GID generation on the subscriber.

Thanks for the updated patch. Patch applies neatly, make check and make check-world passes. The code changes look fine to me.

In documentation, I did not understand the bold contents in the below documentation:
+       The <parameter>ctx</parameter> parameter has the same contents as for
+       the other callbacks. The parameters <parameter>xid</parameter>
+       and <parameter>gid</parameter> provide two different ways to identify
+       the transaction.  For some systems, the <parameter>gid</parameter> may
+       be sufficient.  However, reuse of the same <parameter>gid</parameter>
+       may lead to it not being a unique identifier.  Therefore, other systems
+       combine the <parameter>xid</parameter> with an identifier of the origin
+       node to form a globally unique transaction identifier.
 The later
+       <command>COMMIT PREPARED</command> or <command>ROLLBACK
+       PREPARED</command> carries both identifiers, providing an output plugin
+       the choice of what to use.

I know that in publisher/subscriber decoding, the prepared transaction gid will be modified to either pg_xid_origin or pg_xid_subid(it is still being discussed in logical decoding of two-phase transactions thread, it is in not yet completely finalized) to solve the subscriber getting the same gid name. 
But in prepare_filter_cb callback, by stating "other systems ..." it is not very clear who will change the GID. Are we referring to publisher/subscriber decoding?

Regards,
Vignesh

Re: [PATCH] Provide more information to filter_prepare

From
Markus Wanner
Date:
On 29.03.21 12:18, vignesh C wrote:
> But in prepare_filter_cb callback, by stating "other systems ..." it is 
> not very clear who will change the GID. Are we referring to 
> publisher/subscriber decoding?

Thanks for your feedback.  This is not about GIDs at all, but just about 
identifying a transaction.  I'm out of ideas on how else to phrase that. 
  Any suggestion?

Maybe we should not try to give examples and reference other systems, 
but just leave it at:

      The <parameter>ctx</parameter> parameter has the same contents as for
      the other callbacks. The parameters <parameter>xid</parameter>
      and <parameter>gid</parameter> provide two different ways to identify
      the transaction.  The later <command>COMMIT PREPARED</command> or
      <command>ROLLBACK PREPARED</command> carries both identifiers,
      providing an output plugin the choice of what to use.

That is sufficient an explanation in my opinion.   What do you think?

Regards

Markus



Re: [PATCH] Provide more information to filter_prepare

From
vignesh C
Date:
On Mon, Mar 29, 2021 at 4:22 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
>
> On 29.03.21 12:18, vignesh C wrote:
> > But in prepare_filter_cb callback, by stating "other systems ..." it is
> > not very clear who will change the GID. Are we referring to
> > publisher/subscriber decoding?
>
> Thanks for your feedback.  This is not about GIDs at all, but just about
> identifying a transaction.  I'm out of ideas on how else to phrase that.
>   Any suggestion?
>
> Maybe we should not try to give examples and reference other systems,
> but just leave it at:
>
>       The <parameter>ctx</parameter> parameter has the same contents as for
>       the other callbacks. The parameters <parameter>xid</parameter>
>       and <parameter>gid</parameter> provide two different ways to identify
>       the transaction.  The later <command>COMMIT PREPARED</command> or
>       <command>ROLLBACK PREPARED</command> carries both identifiers,
>       providing an output plugin the choice of what to use.
>
> That is sufficient an explanation in my opinion.   What do you think?

The above content looks sufficient to me.  As we already explain this
earlier "The optional filter_prepare_cb callback is called to
determine whether data that is part of the current two-phase commit
transaction should be considered for decoding at this prepare stage or
later as a regular one-phase transaction at COMMIT PREPARED time."
which helps in understanding what filter_prepare_cb means.
Thoughts?

Regards,
Vignesh



Re: [PATCH] Provide more information to filter_prepare

From
Markus Wanner
Date:
On 29.03.21 13:04, vignesh C wrote:
> The above content looks sufficient to me.

Good, thanks.  Based on that, I'm adding v7 of the patch.

Regards

Markus

Attachment

Re: [PATCH] Provide more information to filter_prepare

From
vignesh C
Date:
On Mon, Mar 29, 2021 at 4:46 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
>
> On 29.03.21 13:04, vignesh C wrote:
> > The above content looks sufficient to me.
>
> Good, thanks.  Based on that, I'm adding v7 of the patch.
>

Thanks for the updated patch.

@@ -440,7 +441,8 @@ pg_decode_rollback_prepared_txn(LogicalDecodingContext *ctx,
  * substring, then we filter it out.
  */
 static bool
-pg_decode_filter_prepare(LogicalDecodingContext *ctx, const char *gid)
+pg_decode_filter_prepare(LogicalDecodingContext *ctx, TransactionId xid,
+                                                const char *gid)
 {
        if (strstr(gid, "_nodecode") != NULL)
                return true;

Currently there is one test to filter prepared txn with gid having
"_nodecode". I'm not sure if we can have any tests based on xid, I'm
sure you might have thought about it, Have you intentionally not
written any tests as it will be difficult to predict the xid. I just
wanted to confirm my understanding.

Regards,
Vignesh



Re: [PATCH] Provide more information to filter_prepare

From
Markus Wanner
Date:
On 29.03.21 14:00, vignesh C wrote:
> Have you intentionally not
> written any tests as it will be difficult to predict the xid. I just
> wanted to confirm my understanding.

Yeah, that's the reason this is hard to test this with a regression 
test.  It might be possible to come up with a TAP test for this, but I 
doubt that's worth it, as it's a pretty trivial addition.

Regards

Markus



Re: [PATCH] Provide more information to filter_prepare

From
vignesh C
Date:
On Mon, Mar 29, 2021 at 5:40 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
>
> On 29.03.21 14:00, vignesh C wrote:
> > Have you intentionally not
> > written any tests as it will be difficult to predict the xid. I just
> > wanted to confirm my understanding.
>
> Yeah, that's the reason this is hard to test this with a regression
> test.  It might be possible to come up with a TAP test for this, but I
> doubt that's worth it, as it's a pretty trivial addition.
>

Thanks, I don't have any more comments.

Regards,
Vignesh



Re: [PATCH] Provide more information to filter_prepare

From
Amit Kapila
Date:
On Mon, Mar 29, 2021 at 4:46 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
>
> On 29.03.21 13:04, vignesh C wrote:
> > The above content looks sufficient to me.
>
> Good, thanks.  Based on that, I'm adding v7 of the patch.
>

Pushed. In the last version, you have named the patch incorrectly.

-- 
With Regards,
Amit Kapila.



Re: [PATCH] Provide more information to filter_prepare

From
Markus Wanner
Date:
On 30.03.21 10:33, Amit Kapila wrote:
> Pushed. In the last version, you have named the patch incorrectly.

Thanks a lot, Amit!

Regards

Markus